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Preface 



The Symposium on Theoretical Aspects of Computer Science (STAGS) is held 
annually, alternating between France and Germany. The current volume consti- 
tutes the proceedings of the 16th STAGS conference, organized jointly by the 
Special Interest Group for Theoretical Computer Science of the Gesellschaft fiir 
Informatik (GI) in Germany, and Maison de ITnformatique et des Mathema- 
tiques Discretes (MIMD) in France. 

The conference took place in Trier - the oldest town in Germany, with more 
than 2 millennia of history. Previous symposia of the series were held in Paris 
(1984), Saarbriicken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), 
Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), Wurzburg 
(1993), Caen (1994), Miinchen (1995), Grenoble (1996), Liibeck (1997), and 
Paris (1998). All proceedings of the series have been published in the Lecture 
Notes of Computer Science series of Springer- Verlag. 

STAGS has become one of the most important annual meetings in Europe for 
the theoretical computer science community. This time, altogether 300 authors 
from 36 countries on five continents submitted their papers. Each submission was 
sent to five members of the program committee for review. During the program 
committee session 51 out of the 146 submissions were accepted for presenta- 
tion. In two of the selected papers the same result was proved independently. 
The program committee decided to include the final versions of both papers in 
the proceedings (Icking, Klein, Langetepe: An Optimal Competitive Strategy for 
Walking in Streets and Semrau, Schuierer: An Optimal Strategy for Searching 
in Unknown Streets) and to have the result presented in a joint talk during the 
conference. The program committee was impressed by the high scientific quality 
of the submissions as well as the broad spectrum of topics they covered within 
the area of theoretical computer science. In spite of this a number of good papers 
submitted had to be rejected due to general limitations of the conference. 

The program committee consisted of S. Albers (Saarbriicken), R. Amadio 
(Marseille), R. Gori (Bordeaux), J. Esparza (Miinchen), J. Hromkovic (Aachen), 
G. Kenyon (Paris), J. Kobler (Ulm), D. Krizanc (Ottawa), Gh. Meinel (Trier, 
chair), A. Petit (Gachan), S. Rudich (Pittsburgh), J. Sgall (Praha), R. Silvestri 
(Roma), S. Tison (Lille, co-chair), and P. Widmayer (Ziirich). We wish to thank 
all the members for their work in evaluating the significance and scientific merits 
of the submitted papers. The program committee was assisted by numerous 
reviewers from all over the world whose names are listed on the next pages. We 
would like to thank all these scientists for their efforts. Without their voluntary 
work it would have been impossible to organize such a high quality conference. 

We also thank the three invited speakers of this meeting for accepting our 
invitation and sharing with us their insights on interesting developments in our 



area. 
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Preface 



We acknowledge the support of the following institutions for this conference: 
Deutsche Forschungsgemeinschaft, European Commission, Institut fiir Telematik 
(Trier), and Universitat Trier. 

Finally, we would like to thank the members of the organizing committee 
consisting of J. Bern, C. Damm, Ch. Meinel, and M. Mundhenk. 

Trier, February 1999 Sophie Tison and Christoph Meinel 
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Algorithms for Selfish Agents 

Mechanism Design for Distributed Computation 



Noam Nisan* 

Institute of Computer Science, Hebrew U., Jerusalem and IDC, Herzliya. 



Abstract. This paper considers algorithmic problems in a distributed 
setting where the participants cannot be assumed to follow the algorithm 
but rather their own self-interest. Such scenarios arise, in particular, 
when computers or users aim to cooperate or trade over the Internet. As 
such participants, termed agents, are capable of manipulating the algo- 
rithm, the algorithm designer should ensure in advance that the agents’ 
interests are best served by behaving correctly. 

This exposition presents a model to formally study such algorithms. This 
model, based on the field of mechanism design, is taken from the author’s 
joint work with Amir Ronen, and is similar to approaches taken in the 
distributed AI community in recent years. Using this model, we demon- 
strate how some of the techniques of mechanism design can be applied 
towards distributed computation problems. We then exhibit some issues 
that arise in distributed computation which require going beyond the 
existing theory of mechanism design. 



1 Introduction 

A large part of research in computer science is concerned with protocols and 
algorithms for inter-connected collections of computers. The designer of such an 
algorithm or protocol always makes an implicit assumption that the participating 
computers will act as instructed - except, perhaps, for the faulty or malicious 
ones. 

With the emergence of the Internet as the platform of computation, this 
assumption can no longer be taken for granted. Computers on the Internet belong 
to different persons or organizations, and will likely do what is most beneficial 
to their owners - act “selfishly”. We cannot simply expect each computer on 
the Internet to faithfully follow the designed protocols or algorithms. It is more 
reasonable to expect that each selfish computer will try to manipulate it for its 
owners’ benefit. An algorithm or protocol intended for selfish computers must 
therefore be designed in advance for this kind of behavior! 

Such protocols and algorithms will likely involve payments (or other trade) 
between the selfish participants. One can view this challenge (of designing proto- 
cols and algorithms for selfish computers) as that of designing automated trade 

* This research was supported by grants from the Israeli ministry of Science and the 
Israeli academy of sciences. 
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rules for the Internet environment. The normal practices of human trade, while 
clearly relevant, cannot be directly applied due to the much greater complexity 
involved and due to the automated nature of the trade. 

The view taken in this paper is that of a systems’ engineer that has certain 
technical goals for the global behavior of the Internet. We view the selfishness of 
the participants as an obstacle to our goals, and we view the trade and payments 
involved as a way to overcome this obstacle. In economic terms, we desire a 
virtual “managed economy” of all Internet resources, but due to the selfishness 
of the participants we are forced to obtain it using the “invisible hand” of “free 
markets” . Our goal is to design the market rules as to ensure the desired global 
behavior. 

We first present a formal model that allows studying these types of issues. 
The model relies on the rationality of the participants and is game-theoretic 
in nature. Specifically, it is based upon the theory of mechanism design. The 
model is directly taken from the author’s joint work with Amir Ronen PS], and 
is similar in spirit to some models studied in the distributed AI community. 
After presenting the model we present some of the basic notions and results 
from mechanism design in our distributed computation setting. We do not in- 
tend here to give a balanced or exhaustive survey of mechanism design, but 
rather to pick and choose the notions that we feel are most applicable to our 
applications in distributed computation. Finally, we present some scenarios that 
arise in distributed computation that require going beyond the existing theory 
of mechanism design. 

Before getting into the model, we will mention some of the application areas 
we have in mind, and shortly mention some of the existing work in computer 
science along this and similar tracks. 



2 Sample Scenarios 

We shortly sketch below three (somewhat related) application areas that we feel 
require these types “selfish algorithms” . These application areas are each quite 
wide in their scope, involve complicated optimizations of resources, and directly 
involve differing goals of the participants. Most of the works cited below lie in 
one of these areas. 



2.1 Resource Allocation 

The aggregate power of all computers on the Internet is huge. In a “dream world” 
this aggregate power will be optimally allocated online among all connected 
processors. One could imagine CPU-intensive jobs automatically migrating to 
CPU-servers, caching automatically done by computers with free disk space, 
etc. Access to data, communication lines, and even physical attachments (such 
as printers) could all be allocated across the Internet. This is clearly a difficult 
optimization problem even within tightly linked systems, and is addressed, in 
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various forms and with varying degrees of success, by all distributed operating 
systems. 

The same type of allocation over the Internet requires handling an additional 
problem: the resources belong to different parties who may not allow others to 
freely use them. The algorithms and protocols may, thus, need to provide some 
motivation for these owners to “play along” . 

2.2 Routing 

When one computer wishes to send information to another, the data usually gets 
routed through various intermediate routers. So far this has been done volun- 
tarily, probably due to the low marginal cost of forwarding a packet. However, 
when communication of larger amounts of data becomes common (e.g. video), 
and bandwidth needs to be reserved under various quality of service (QoS) proto- 
cols, this altruistic behavior of the routers may no longer hold. If so, we will have 
to design protocols specifically taking the routers’ self-interest into account. 

2.3 Electronic Trade 

Much trade is taking place on the Internet and much more is likely to take 
place on it. Such trade may include various financial goods (stocks, currency 
exchange, options), various information goods (video-on-demand, database ac- 
cess, music), many services (help desk, flower delivery, data storage), as well as 
real goods (books, groceries, computers) . This trade will likely involve sophis- 
ticated programs communicating with each other trying to find “the best deal” . 
In addition, this will also raise the possibility of various brokerage services such 
as information providers, aggregators, and other types of agents. Clearly any 
system that enables such programs to efficiently trade with each other needs to 
offer general economic efficiency while very strongly taking into account the fact 
that all participants have totally differing goals. 

3 Existing Work 

Game theory, Economics, and Computer Science 

In recent years there have been many works that tried to introduce economic 
or game-theoretic aspects into computational questions. The approach presented 
here is part of this trend, but is much narrower, taking specifically the direction 
of mechanism design. The reader interested in the wider view may start his 
exploration e.g. with the surveys |BIE|, the book m, the web sites 0013, or 
the papers in the conference 0. 

Mechanism design 

The field of mechanism design (also known as implementation theory) aims 
to study how privately known preferences of many people can be aggregated 
towards a “social choice” . The main motivation of this field is micro-economic. 
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and the tools are game-theoretic. Emphasis is put on the implementation of 
various types of auctions. 

In the last few years this field has received much interest, especially due to its 
influence on large privatizations and spectrum allocations PI- An introduction 
to this held can be found in [T7tI chapter 23] |2Dl chapter 10], and an influential 
web site in Pi- 

Mechanism design in Computer Science 

One may identify three motivations for combining mechanism design with 
computational questions. 

Auction implementation: As auctions become more popular as well as 
more complicated, they are often implemented using computers and com- 
puter networks. Many computational implementation questions result. These 
range from purely combinatorial ones regarding optimization in complex 
combinatorial auctions to systems questions regarding communication and 
performance issues in wide-scale auctions. 

Leveraging Market Power: In the “real world” the invisible hand of free 
markets seems to yield surprisingly good results for complex optimization 
problems. This occurs despite the many underlying difficulties: decentralized 
control, uncertainties, information gaps, limited computational power, etc. 
One is tempted to apply similar market-based ideas in computational sce- 
narios with similar complications, in the hope of achieving similarly good 
results. 

Handling Selfishness: This is the approach taken here, and it views mech- 
anism design introduced into computational problems as a necessary evil, 
required to deal with the differing goals of the participants. 

Even though these motivations are different philosophically, research often 
combines aspects from all approaches. Below we shortly sketch some of previous 
work done introducing mechanism design into different branches of computer 
science, without attempting to further classify them. 

Distributed AI 

In the last decade or so, researchers in AI have studied cooperation and 
competition among “software agents” . The meaning of agents here is very broad, 
incorporating attributes of code-mobility, artificial-intelligence, user-customiza- 
tion, and self-interest. 

A subfleld of this general direction of research takes a game theoretic anal- 
ysis of agents’ goals, and in particular uses notions from mechanism design m 
m m A related subfleld of Distributed AI, sometimes termed market-based 
computation aims to leverage the notions of free markets in order 

to solve distributed problems. These subflelds of DAI are related to our work. 

Communication Networks 

In recent years researchers in the held of network design adopted a game 
theoretic approach (See e.g. [I Ijh In particular mechanism design was applied 
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to various problems including resource allocation cost sharing, and pricing 

m- 

4 The Model 

In this section we formally present the model. It is taken from the author’s joint 
work with Amir Ronen m 

The model is concerned with computing functions that depend on inputs 
that are distributed among n different agents. A problem in this model has, in 
addition to the specification of the function to be computed, a specification of 
the goals of each of the agents. The solution, termed a mechanism, includes, in 
addition to an algorithm computing the function, payments to be handed out 
to the agents. These payments are intended to motivate the agents to behave 
“correctly” . 

Subsection 4.1 describes what a mechanism design problem is. In subsec- 
tion 4.2 we define what a good solution is: an implementation with dominant 
strategies. Subsection 4.3 defines a special class of good solutions: truthful im- 
plementations, and states the well-known fact that restricting ourselves to such 
solutions loses no generality. 

4.1 Mechanism Design Problem Description 

Intuitively, a mechanism design problem has two components: the usual algorith- 
mic output specification, and descriptions of what the participating agents want, 
formally given as utility functions over the set of possible outputs (outcomes) . 

Definition 1 (Mechanism Design Problem) A mechanism design problem 
is given by an output specification and by a set of agent’s utilities. Specifically: 

1. There are n agents, each agent i has available to it some private input 
t* G T* (termed its type). Everything else in this scenario is public knowledge. 

2. The output specification maps to each type vector t = t^...f"‘ a set of allowed 
outcomes o. 

3. Each agent i ’s preferences are given by a real valued function: u*(o, T), called 
its valuation. This is a quantification of its value from the outcome o, when 
its type is T, in terms of some common currency. I.e. if the mechanism’s 
outcome is o and in addition the mechanism hands this agent p’ units of this 
currency, then its utility will be u’' = p’' + i;*(o, f®)Q. This utility is what the 
agent aims to optimize. 

In this paper we only discuss optimization problems. In these problems the 
outcome specification is to optimize a given objective function. We present the 
definition for minimization problems. 

^ This is termed “semi-linear utility” . In this paper we limit ourselves to this type of 
utilities. 
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Definition 2 (Mechanism Design Optimization problem) This is a mech- 
anism design problem where the outcome specification is given by a positive real 
valued objective function g{o,t) and a set of feasible outcomes F. The required 
output is the outcome o G F that minimizes g. 

4.2 The Mechanism 

Intuitively, a mechanism solves a given problem by assuring that the required 
outcome occurs, when agents choose their strategies as to maximize their own 
selfish utilities. A mechanism needs thus to ensure that players’ utilities (which 
it can influence by handing out payments) are compatible with the algorithm. 

Notation: We will denote (a^, ...a”) by a“*. (a*, a“*) will denote 

the tuple (a^, . . . a”) 

Definition 3 (A Mechanism) A mechanism m = (o,p) is composed of two el- 
ements: An outcome o = o(a), and an n-tuple o/ payments p^(a)...p"(a). Specif- 
ically: 

1. The mechanism defines for each agent i a family o/ strategies Ad . Agent i 
can choose to perform any a* G Ab 

2. The first thing a mechanism must provide is an outcome function o = 
o(a^...a"). 

3. The second thing a mechanism provides is a payment p* = p*(a^...a") to 
each of the agents. 

4-. We say that a mechanism is an implementation with dominant strategies 
( or in short just an implementation ) if 
— For each agent i and each F there exists a strategy a* G Ad , termed 
dominant, such that for all possible strategies of the other agents a“*, 
a* maximizes agent i’s utility. I.e. for every a'* G A*, if we define o = 
o(a*,a“®), o' = o{a'\a~'), p' = p'{a\a~'), p'' = p’’la'\a~') , then 
v'{t\o)-\-p' > v'{t\o') +p" 

— For each tuple of dominant strategies a = (a^...a") the outcome o(a) 
satisfies the specification. 

4.3 The Revelation Principle 

The simplest types of mechanisms are those in which the agents’ strategies are 
to simply report their types. 

Definition 4 (Truthful Implementation) We say that a mechanism is truth- 
ful if 

1. For all i, and all F , A® = T' , i.e. the agents’ strategies are to report their 
type. (This is called a direct revelation mechanism.) 

2. Truth-telling is a dominant strategy, i.e. a' = F satisfies the definition of a 
dominant strategy above. 
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A simple observation, known as the revelation principle, states that without 
loss of generality one can concentrate on truthful implementations. 

Proposition 4.1 m, page 871 ) If there exists a mechanism that implements 
a given problem with dominant strategies then there exists a truthful implemen- 
tation as well. 

Proof: (sketch) We let the truthful implementation simulate the agents’ strate- 
gies. I.e. given a mechanism (o,p^, ...p”), with dominant strategies we 

can define a new one by = o(a^(t^)...a"(t”)) and (p*)*(t^...t") = 

p*(ai(ti)...a"(t")). □ 

5 Applying Existing Mechanism Design Theory 

In this section we present several well known mechanisms. While these mecha- 
nisms are the usual ones one would find in a standard text on mechanism design, 
we present them in a distributed-computation setting. The implementations pro- 
vided are all truthful ones, i.e. they follow this pattern: 

1. Each agent reports its input to the mechanism. 

2. The mechanism computes the desired outcome based on the reported types. 

3. The mechanism computes payments for each agent. 

The challenge in these examples is to determine these payments as to ensure 
that the truth is indeed a dominating strategy for all agents. 

5.1 Maximum 
Story0 

A single server is serving many clients. At a certain time, the server can 
serve exactly one request. Each client has a private valuation for his request 
being served. (The valuation is 0 if the request is not served.) We want the most 
valuble request to be served. 

Failed attempts: 

One might first attempt to simply ignore all payments (i.e. set p* = 0 for all 
i). This however is clearly insufficient since it motivates each agent to exaggerate 
his valuation, as to get his request executed. The second attempt would be to 
let the winning agent pay his declaration. I.e. set p* = — t'® for the agent i that 
declared the highest t'® (and p® = 0 for all others). This also fails since the agent 
with highest t® is motivated to reduce his declaration to slightly above the second 
highest valuation offered. This will result in his request still being served, and 
his payment reduced. In case agent i has imperfect information about the others 

^ This is an auction and the solution presented is Vickrey’s well-known second price 
auction m- 
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this strategic behavior may lead him to accidently declare a lower value than 
the second valuation, which will result in a sub-optimal allocation. 

Solution: 

The agent that offers the highest valuation for his request pays the second 
highest price offered. I.e. p® = —P, where i offers the highest price and j the 
second highest. All other agents have = 0. 

Analysis: 

To see why this is a truthful implementation, consider agent i and consider 
a lie t'* 7 ^ P. If this lie does not change the allocation, then nothing is gained 
or lost by agent i since his payment is also unaffected by his own declaration. 
If this lie gets his request served, then t'® > P > P and he gains P of utility 
from his valuation of the served request, but he loses P on payments, thus his 
total utility would be P — P < 0, as opposed to 0 in the case of the truth. On 
the other hand, if his lie makes him lose the service, then his utility is now 0, as 
opposed to a positive number which it was in the truthful case. 

5.2 Threshold 
Storyll 

A single cache is shared by many processors. When an item is entered into 
the cache, all processors gain faster access to this item. Each processor i will 
save P in communication costs if a certain item X is brought into the cache. (I.e 
its valuation of loading A is t® > 0, and of not loading it, 0.) The cost of loading 
A is a publicly known constant C. We want to load A iff P > C. 

Failed attempts: 

We may first attempt to just divide the total cost between the n participating 
agents, i.e. set p® = —Cjn for all i. This however motivates any agent with 
P > Cjn to announce his valuation as greater than C, and thus assure that A 
is loaded. We may, as a second attempt, let each agent pay the amount declared 
(or perhaps something proportional to it.) In this case, however, we will be faced 
with a free-rider problem, where agents will tend to report lower valuation than 
the true ones so as to reduce their payments. This, when done by several agents, 
may result in the wrong decision of not loading A. 

Solution: 

In case A is loaded, each agent pays a sum equal to the minimum declaration 
required from him in order to load A, given the other’s declarations. I.e. the only 
case where p® yf 0, is when P < C < P , in which case p® = ~ C 

(a negative number). 

The analysis is left to the reader. Alternatively, this example may be seen to 
be a special case of the example below. 

This example can be generalized to the case where P can be negative as well. 

^ This is known as the “public project” problem, and the solution is known as the 
Clarke tax 0. 
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5.3 Shortest Path 
Story: 

We have a communication network modeled by a directed graph G, and two 
special nodes in it x and y. Each edge e of the graph is an agent. Each agent e 
has private information (its type) > 0 which is the agent’s cost for sending a 
single message along this edge. The goal is to find the cheapest path from x to 
y (as to send a single message from x to y). I.e the set of feasible outcomes are 
all paths from x to y, and the objective function is the path’s total cost. Agent 
e’s valuation is 0 if his edge is not part of the chosen path, and — t® if it is. We 
will assume for simplicity that the graph is bi-connected. 

Solution: 

The following mechanism ensures that the dominant strategy for each agent 
is to report his true type t® to the mechanism. When all agents honestly report 
their costs, the cheapest path is chosen: The outcome is obtained by a simple 
shortest path calculation. The payment p® given to agent e is 0 if e is not in 
the shortest path and p® = da-e — {da ~ if it is. Here t'® is the agents’ 
reported input (which may be different from its actual one), da is the length 
of the shortest path (according to the inputs reported), and dc-e is the length 
of the shortest path that does not contain e (again according to the reported 
types). 

Analysis: 

First notice that if the same shortest path is chosen with t'® as with t® 
then the payment and thus utility of the agent does not change. A lie t'® > t® 
will cause the algorithm to choose the shortest path that does not contain e 
as opposed to the (correct one) which does contain it iff dc-e — da < — f®. 

This directly implies that e’s utility would have been positive had e been chosen 
in the path (as opposed to 0 when its not chosen), thus the truth is better. A 
similar argument works to show that t'® < t® is worse than the truth. 

Many other graph problems, where agents are edges, and their valuations 
proportional to the edges’ weights, can be implemented by a VCG mechanism. 
In particular minimum spanning tree and max-weight matching seem natural 
problems in this setting. A similar solution applies to the more general case 
where each agent holds some subset of the edges. 

Algorithmic Problem: How fast can the payment functions be computed? 
Can it be done faster than computing n versions of the original problem? For the 
shortest paths problem we get the following equivalent problem: given a directed 
graph G with non-negative weights, and two vertices in it x,y. Find, for each 
edge e in the graph, the shortest path from x to y that does not use e. Using 
Disjktra’s algorithm for each edge on the shortest path gives an O(nmlogn) 
algorithm. Is anything better possible? Maybe O(mlogn)? 
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5.4 Utilitarian Functions 

Arguably the most important positive result in mechanism design is what is 
usually called the generalized Vickrey-Groves-Clark (VCG) mechanism ^01 
All previous examples are, in fact, VGG mechanisms. In this section we 
present the general case. 

The VGG mechanism applies to mechanism design optimization problems 
where the objective function is simply the sum of all agents’ valuations. 

Definition 5 An optimization mechanism design problem is called utilitarian if 
its objective function satisfies g{o,t) = 

Definition 6 We say that a direct revelation mechanism m = {o{f),p{t)) be- 
longs to the VCG family if 

1. o(t) G argmax„(X;r=i i^*(^%o)). 

2. p^{t) = v'^{o{t),C) + where h*() is an arbitrary function oft~'‘. 

Theorem 5.1 (Groves }lDf) A VCG mechanism is truthful. 

Proof: (sketch) Let . . . , d" denote the declaration of the agents and G, . . . ,t^ 
denote their real types. Suppose that truth telling is not a dominant strategy, 
then there exists d, i, t, d'* such that 

v\t\ o{d~\ f )) + p\t\ o{d~\ f )) + h\d~^) < 

v\t\ o{d~\ d'")) + p\t\ o{d~\ d'*)) + h\d~^) 

But then 



Y,v\o{d-\nf) < Y,y\o{d-\d'^),f) 

i=l i=l 

In contradiction for the definition of o(). □ 

Thus a VGG mechanism essentially provides a solution for any utilitarian prob- 
lem (except for the possible problem that there might be dominant strategies 
other than truth-telling). It is known that (under mild assumptions) VGG are 
the only truthful implementation for utilitarian problems (jSI)- 



5.5 More Issues in Mechanism Design 

The examples presented here demonstrate only the most basic notions from the 
field of mechanism design. Many more issues addressed by the theory of mech- 
anism design are applicable to the distributed computation setting. We briefly 
mention just some of the issues commonly studied by mechanism design (and 
other branches of game theory) that we feel may find applications in distributed 
computation. 
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Bayesian-Nash equilibrium: Our notion of a solution was very strong, 
requiring dominant strategies. Weaker notions of equilibrium are also often 
considered, in particular Bayesian-Nash equilibrium. 

Non semi-linear utilities: We assumed that the utility of each agent is 
additive in the money. More general types of utilities may be considered, 
where money influences the utility in an arbitrary manner. 

Budgets: We did not put any requirements on the sums of money involved 
in a mechanism. At least two types of constraints are widely studied: con- 
straining the total money spent by the mechanism (either to as large a neg- 
ative amount as possible, or to 0 - budget balance), and considering budget 
limitations of the agents. 

Common value models: We assumed that each agent has a known valu- 
ation function that is independent from the others. One may alternatively 
assume a valuation that is common to all agents but is not fully known by 
them. 

Repeated Games: We only considered a single instance of a problem. One 
may clearly consider repeated instances. 

Coalitions: We only considered manipulation by a single agent. Clearly one 
may study coalitions of agents. 

6 Beyond Existing Mechanism Design 

We feel that the application of existing mechanism design in distributed com- 
putation, as demonstrated above, is just a first step. Many of the considerations 
of distributed computation are quite different from the ones usually considered 
in mechanism design. Addressing these considerations will thus require new re- 
search. In this section we exhibit several scenarios in distributed computation 
that raise questions that indeed go beyond the current scope of mechanism de- 
sign. 

6.1 Task Scheduling 
Story: 

A computer has k tasks it wishes to execute, and can execute each of them on 
any one of n servers. Each server i knows, for every task j, the time it requires 
to execute this task. Each server’s cost is proportional to the time it spends on 
executing the tasks assigned to it. Our goal is to have all tasks completed as 
soon as possible (i.e. to minimize the completion time of the last task.) 

This problem was considered in m- Here are some of the issues raised by 
this problem and addressed there. Similar issues arise in many other problems 
in distributed computation. 

Issues: 

Non-utilitarian Problem: The goal in this example is non-utilitarian. 
Thus, the VCG mechanism cannot be applied and new mechanisms need to 
be invented. 
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Impossibility: It is possible to prove that no mechanism perfectly solves 
this problem. As is common in Computer Science, one should try to overcome 
this impossibility. In particular, the following approaches may be considered 
(and were all studied in CHI): 

Approximation: Find a mechanism that approximates the optimal so- 
lution as well as possible. 

Randomization: In Computer Science as well as in game theory ran- 
domization often helps. In turns out that for this problem, randomized 
mechanisms can provably do better than deterministic ones. 

Model Extensions: Every model is an imperfect abstraction of reality. 
One may incorporate useful attributes of reality into the model as to 
make an impossible result possible. In CHI the model was extended by 
assuming that the mechanism need only compute the payments after 
the tasks were actually executed, giving it additional information. 
Computational Intractability: Even from a purely algorithmic point 
of view, the task scheduling problem is intractable (NP-complete) . When 
adding the requirements of a mechanism things only get worse. In particu- 
lar, standard ways of overcoming the computational intractability (such as 
tractable approximations) have complicated interactions with the require- 
ments of mechanism design. 



6.2 Maximum Independent Set 
Story: 

There are n processors connected in a linear array (i.e. each processor i 
is connected to i — 1 and to i -I- 1). Each processor wants to execute a single 
job, and values it at t* > 0. The problem is that executing the job requires 
exclusive access to the common link with each of its neighbors. Thus no two 
consecutive processors can execute their job. Our goal is to execute the set of 
tasks with maximal valuation, i.e. to find an independent set S of processors 
that maximizes 

Model Restriction: 

In this story we want to find a decentralized solution. I.e. we want to design 
a protocol, that runs on these computers, using only the available communica- 
tion links, and without assuming any central trusted computer, or any other 
communication links. 

Solution: 

Our protocol has two phases a left-to-right phase and a right-to-left phase. 
In the left-to-right phase, each processor places a bid i?* for link on its right. 
These offers are computed by each processor in turn as follows: = t^, and 

for 1 < f < n, i?* = max{t^ — i?*“^,0). In the right-to-left phase each processor 
places a bid L* on the link to its left as follows: L" = t", and for 1 < i < n, 
L* = max{f‘ — 0). Processor i wins the left link iff U > and wins the 

right link iff i?* > It can execute its task (i.e. is chosen to be in S) if it has 
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won both links. In this case its payment is — p* = i?® ^ (i.e. the second 

price on each of links it has won) . 

Analysis: 

There are many issues to consider here: 

Algorithmic correctness: One may verify that i?® is the difference between 
the weight of the maximum weight independent set in — 1 and the weight 
of the maximum weight independent set in Similarly, L® is the difference 
between of the weights of the maximum weight independent sets in z + l...n 
and Clearly z should be chosen to be in S' if t® > L®+^ + i?®“^ (ties 

can be broken arbitrarily), which is exactly what this protocol does. This 
protocol can be viewed as a dynamic programming solution of this problem. 
Domination of the Truth: Assume that the players’ strategies are limited 
to acting according to some fixed valuation t'®. Such a model may be called 
the “honest but selfish” case. In this case one may observe that the protocol 
achieves the VCG mechanism that is a solution since the problem is indeed 
utilitarian. 

Dishonesty: A more general model would allow all strategies made possible 
by the protocol. In this case the processors could act according to a different 
t'® in each phase. One may verify that in this model the truth is no longer 
dominant. Yet, truth is still a Nash equilibrium. 

Ensuring Honesty: There are various ways to augment the model as to 
force the processors to be consistent in both phases, and thus essentially 
force the “honest but selfish” situation. In particular, if processors z — 1 
and z + 1 can communicate with each other then they can catch z’s dishon- 
esty. Such communication may alternatively be implicitly achieved by using 
cryptographic signatures. 

Decentralized Payments: The payments in this solution were to be given 
to some party outside of the n involved processors. It would have been nice 
to have a mechanism where the payments are only transferred between con- 
nected processors. 



6.3 Decentralized Auction 
Story: 

A single item is to be auctioned over the Internet among n humans (each 
with his own computer). 

Restriction: 

There is no trusted entity. In particular we do not trust the auctioneer to 
faithfully execute the auction rules or to keep any secrets. In the absence of such 
a trusted entity we would like to ensure two goals: 

— The auction is executed according to the published auction rules (e.g. second 
price). 
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— No information about bids is leaked to any participant, beyond the results 
of the auction which become public knowledge. I.e. only the identity of the 
winner (but not his bid), and the amount of the second highest bid (but not 
the identity of the bidder) become known. 



Solution: 

The celebrated “oblivious circuit evaluation” cryptographic protocols |E3 El 
El exactly achieve this goal (as long as not too many of the participants collude 
to lie). These cryptographic protocols can faithfully carry out any distributed 
computation without leaking any information to the participants. What cannot, 
in principle, be ensured by cryptography is that the participants reveal their 
inputs. This, however, is ensured by the mechanism. We should note that these 
cryptographic protocols, while theoretically tractable, are quite impractical. 
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Abstract. We define here the reduced genus of a multigraph as the min- 
imum genus of a hypergraph having the same adjacencies with the same 
multiplicities. Through a study of embedded hypergraphs, we obtain new 
bounds on the coloring number, clique number and point arboricity of 
simple graphs of a given reduced genus. We present some new related 
problems on graph coloring and graph representation. 



1 Introduction 

Graph Coloring is a central topic in Graph Theory and numerous studies relate 
coloring properties with the genus of a graph. The maximum chromatic number 
among all graphs which can be embedded in a surface of genus g is given by 
Heawood’s formula as established by Ringel and Youngs for g > 0 |22|; the 
case (7 = 0, which is the Four Color Theorem, has been established by Appel 
and Haken PI2| (see m for a simpler proof). Other approaches have related 
the chromatic number with other graph invariants: For instance, Szekeres and 
Wilf |2B| gave the simple upper bound x(G) < sw(G) -I- 1 where sw(G) is the 
maximum among all induced subgraphs of G of the minimum degree of the 
vertices of H (the value col(G) = sw(G)-|-l, the coloring number of G, is actually 
an upper bound for the choice number of G). On the other hand, it has been 
established that, for any positive integer n, there exists an n-chromatic graph G 
containing no triangles (see PHIEIP2I and for constructions). 

As a natural generalization of the graphs, Berge introduced the concept of 
hypergraph |3j: a hypergraph is a pair Ti. = (X,£), where A is a finite set and 
f is a family {Ei,i S I) of subsets of X, such that: Ei ^ % (Vf e I) and 
UiG/ The elements of X and £ are respectively the vertices and the 

edges (or hyperedges) of the hypergraph. Different generalizations of the concept 
of planarity to hypergraphs have been proposed (See |l bj . for instance). The 
first generalization is due to Zykov m- He proposed to represent the edges of a 
hyper graph by a subset of the faces of a planar map. Walsh m has shown that 
Zykov’s definition (as well as another definition due to Cori j^) is equivalent 
to the following: A hypergraph is planar if and only if its vertex-edge incidence 
graph is planar m (see also iniEa). This planarity concept is easily extended 

* This work was partially supported by the Esprit LTR Project no 20244- 
ALCOM IT. 
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to oriented surfaces: the genus of a hypergraph is defined as the genus of its 
incidence graph. The adjacency multigraph of a hypergraph H is the multigraph 
on the same vertex set in which the number of edges incident to two vertices x 
and y is the same as the number of edges of H containing x and y (the edges 
of the hypergraphs become edge-disjoint cliques of the adjacency multigraph). 
The reduced genus of a multigraph G is then defined has the minimum genus of 
a hypergraph whose adjacency multigraph is G. 

In Section 12 we recall some basic definitions on hypergraphs and introduce 
the corresponding notations used through this paper. In Section 0 we introduce 
the charge of a vertex that allows to prove the existence of a vertex having 
relatively few neighbors and, also, to bound the maximal cardinality of a set of 
mutually adjacent vertices. In Section 0, new charge functions lead to improved 
bounds in the cases of linear hypergraphs and triangle free linear hypergraphs. 
The results obtained in this section lead in Section 0 to new bounds on the 
coloring number, clique number and point arboricity of simples graphs of a given 
reduced genus. Section Elintroduces some new problems on graph representation. 

2 Preliminaries 

A hypergraph is a pair Ti. = (A, £), where A is a finite set and £ is a family 
{EiG G I) of subsets of A, such that: Ei ^ % (Vf G I) and The 

elements of A and £ are respectively the vertices and the edges (or hyperedges) 
of the hypergraph. Two vertices x,y € X are adjacent if they both belong to 
some edge of two edges Ei, Ej are adjacent if their intersection is not empty. 
A vertex a; G A is incident to an edge Ei G £ if x belongs to Ei (see 0). In the 
following, we do not allow a hyper graph to have a loop, that is a single element 
edge. The neighbor set N(v) of a vertex v is the set of the vertices to which v 
is adjacent, the degree d(v) of a vertex v is the number of edges including v. 
The maximum cardinality of an edge of the hypergraph is denoted max | A|. The 
clique number ca{'H) of the hypergraph is the maximum cardinality of a set of 
pairwise adjacent vertices. 

A hypergraph is fc-uniform if all its edges have cardinality k] a 2-uniform 
hypergraph is nothing but a multigraph. A hypergraph is linear if any two edges 
have at most one common element: 

Vi ^ j, \Ein Ej\ < 1 

Linearity somehow extends the notion of simple graph (a 2-uniform linear hy- 
pergraph is nothing but a simple graph). 

The sub-hypergraph of Ti. induced by a subset A C A is the hypergraph 
Ti.Y = (X,£y), where 

£y = {E, n A, A, G £; A, n A ^ 0} 



Definition 1. The incidence graph Incid(Ti) ofTl is the colored bipartite graph 
on (A, £) defined by the vertex-edge incidence, a vertex being colored white (resp. 
black) if it belongs to X (resp. £). 



18 



Patrice Ossona de Mendez 



As a special case, if G is a graph, Incid(G) is the bicolored vertex-edge incidence 
graph of G. 

Remark 1. A hypergraph TL is linear if and only if Incid(Ti) is G 4 -free. 

Definition 2. The adjacency multigraph Adj(7t) of a hypergraph TL is the multi- 
graph on the same vertex set in which the number of edges incident to two vertices 
X and y is the same as the number of edges of TL containing x and y ( the edges 
of the hypergraphs become edge-disjoint cliques of the adjacency multigraph). 

Remark that Adj(7t) is a simple graph if and only if TL is linear. 

Definition 3. The genus of a hypergraph is the genus of its incidence graph. 

Remark that the genus of a multigraph G is the same when considered as a 
multigraph and when considered as a 2-uniform hypergraph. 

3 Embedded Hypergraphs 

3.1 Charges 

In order to prove the existence of vertices with a “small” number of neighbors, 
we introduce the charge of a vertex of an embedded hyper graph. This function 
is closely related Euler’s formula and could allow the adaptation of powerful 
technics to embedded hypergraph, such as the discharging introduced by Heesch 
m and which is a crucial tool for the unavoidability part of the proofs of the 
Four Color Theorem. 

Definition 4. The charge ^(v) of a vertex v is defined by: 



Lemma 1. The sum of the charges of the vertices of a hypergraph TL on n 
vertices and genus g is at most 6n -|- 12{g — 1). 

Proof. The number of vertices of Incid(7f) is n-\- 1£|, where £ is the edge set of TL. 
Hence, the number of faces of the embedding is |if(Incid(7f))| —n—\£\ — 2{g—l). 
As Incid(Ti) is a simple vertex-bipartite map, each face of an embedding has 
length at least 4. Therefore, 2|A(Incid(7f))| is greater or equal to 4 times the 
number of faces of the embedding. Altogether, we get: |if(Incid(7t))| < 2n -|- 




( 1 ) 



2\£\ + A{g-l). 
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Thus, we have: 







3E^(^)-6EEm 

V E veE ' ' 



= 3|S(Incid(7i:))| - 6\£\ 
< 6n+ 12{g — 1) 



□ 



Lemma 2. For any hypergraph H of genus g with at least an edge of cardinal- 
ity at least 3, there exists a map of genus g, which vertex-face hypergraph 
satisfies the following conditions: 

— any edge ofH of size at least 3 is an edge ofH^, 

— any edge ofH of size at most 2 is included into an edge ofTi^, 

— the maximal cardinality of an edge of is equal to the maximal cardinality 
of an edge of hi, 

— the number of neighbors of a vertex v of ( which is an upper bound for 

the number of neighbors of v in hi) is bounded by: 

|A^+WI<E(I^|-2) (2) 

v3E 

( where the sum is taken over all the edges of containing v ) 

Proof. Quadrangulate the embedding of Incid(Ti) by adding new black vertices 
of degree 3. Then, remove all the black vertices of degree 2. This is the incidence 
graph of a hypergraph By construction, all but the last properties are 
obviously satisfied. As no vertex of may have degree 1 and as two edges of H 
incident to a vertex v which are consecutive when considering the neighbors of 
V in the embedding of Incid(Ti) have a non empty intersection, the last property 
follows. □ 



3.2 Vertices with Few Neighbors 

Theorem 1. For any integer g > 0, there exists a constant M{g), such that any 
hypergraph of genus g with at least an edge of cardinality M{g) has a vertex v 
with at most 2max|if| — 4 neighbors. 

Proof. If a hypergraph Ti. on n vertices has genus g, there exists a vertex v of 
TT*' with charge f{v) < 6 + — 1). As n > max \E\ and as 3 — tends to 3 as 

\E\ goes to infinity, if M(g) is large enough, the maximum number of neighbors 
of v is achieved for v incident to two edges of cardinality max|if|. the number 
of neighbors of v in is hence bounded by 2 max |A | — 4. □ 
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Theorem 2. For any integer 5 > 0, there exists a constant M'{g), such that any 
hypergraph of genus g with at least an edge of cardinality M'{g) and such that 
has no vertex of degree less than 3 has a vertex v with at most max\E\ + 4 
neighbors. 

Proof. As previously, if M'(g) is large enough, the maximum number of neigh- 
bors of V is achieved for v incident to one edge of cardinality max \ E\, one edge 
of cardinality 6 and one edge of cardinality 2. the number of neighbors of v in 
7f+) is hence bounded by max \E\ -|- 4. □ 

3.3 Maximal Cliques 

Lemma 3. For any integer 5 > 0, there exists a constant C{g), such that, for 
any hypergraph of genus g on n> C{g) vertices, if all the vertices are pairwise 
adjacent, then there is an edge of cardinality at least ^ . 

Proof. As the complete graphs of genus at most g have a size bounded by a 
function of g, we may assume (by choosing C{g) large enough) that H has at 
least an edge of size at least 3. Furthermore, as the adjacency multigraph of H 
is a partial graph of the one of and has the maximal size of the edges of H 
and Tf'*' are equal, we may consider in place of H. 

Then, we have to consider the following cases: 

— There exists a vertex v of degree 1. 

This vertex is then incident to an edge of cardinality n. 

— All the vertices have degree at least 3. 

Then, there exists a vertex having at most max|A| -|- 4 neighbors. Thus, 
max I FI I is greater or equal to n — 5. 

— All the vertices have degree at least 2 and there exists a vertex v of degree 
2 (incident to edges Ea and Ep) 

Let A = Ea \ Ej 3 and B = Ep \ Ea. Then, A, B, Ea H Ep forms a partition of 
the vertex set of 7F. Assume that Ea and Ejj have cardinality strictly smaller 
than It follows that A and B have cardinality strictly greater than 
• No vertex of AU B has degree 2. 

Let H' be the hypergraph obtained from H by deleting all the vertices 
of Ea n Efj except v and adding an edge E including v, a vertex of A 
and a vertex of B (it can be easily done in such a way that the genus of 
H' is at most g). Then, as the adjacency multigraph of H' is a complete 
graph on |A| -|- |B| -|- 1 vertices (plus some parallel edges), the number of 
neighbors of any vertex x G Ea is given by |A^(a:)| = |A| -|- |B|. According 
to this value and the fact that all the vertices of 7F' have degree at least 
3, the minimal value of the charge of x is achieved if x belongs to Ea, 
an edge of cardinality 3 and an edge of cardinality \B\ — 1. Then f{x) is 
at least: 

6 6 

>7 

- ^ + l ^ - 1 

3 ^ ^ 3 ^ 
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If n is large enough, this value is strictly greater than 6 + ^((/ — 1). As 
the same holds for the vertices in E/^, a contradiction follows. 

• The exists a vertex w G AU B oi degree 2. 

Without loss of generality, we may assume that w belongs to A and is 
incident to edges Ea and E^. 

* there exists vertex z € A which does not belong to Ej. 

By contracting the edges of Incid(7f) incident to z, the vertices cor- 
responding to Ea, Ej 3 , z and those corresponding to the vertices in B 
form a complete bipartite graph homeomorphic to ^"3 |g|. Hence, \B\ 
is bound by a constant depending on g, what contradicts \B\ > 

* all the vertices of Ea belong to EpU E^. 

Then, every vertex belongs to two edges among Ea, E/ 3 , E^. Hence, 
lA'al + |£^/3| + |£^7l > 2n and one of these edges has cardinality at 
least 

□ 

As a direct consequence of this lemma, we have: 

Theorem 3. For any integer g >0, there exists a constant C{g), such that, for 
any hypergraph hi of genus g: 

w(7i) < max(C(g), - max |A I) (3) 

Remark 2. By an easy technical proof, one checks that C(0) = 4, that is: Any 
planar hypergraph Ti. having a clique number at least 5 has an edge of size at 
least |w(7f). 

4 Embedded Linear Hypergraphs 

4.1 Linear Charges 

We shall give an alternate definition of the charge of a vertex taking advantage 
of the linearity assumption, the linear charge. 

Definition 5. The linear charge ■Ci(i') of a vertex v is defined by: 

= 0 ) 

EBv ^ 



Lemma 4. The sum of the linear charges of the vertices of a linear hypergraph 
TL of genus g with n vertices is hounded by 6n + 12{g — 1). 

Proof. The number of vertices of Incid(Tt) is n+ \£\, where £ is the edge set of TL. 
Hence, the number of faces of the embedding is |A(Incid(7t))| — n — \£\ — 2{g—l). 
As Incid(Ti) is a C4-free simple vertex-bipartite map, each face of an embedding 
has length at least 6. Therefore, 2|A(Incid(7f))| is greater or equal to 6 times 
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the number of faces of the embedding. Altogether, we get: 2|if(Incid(7f))| < 
3n + 3|£| + 6(g — 1). 

Thus, we have: 







V E veE ' ' 



= 4|A(Incid(7i:))| - 6\S\ 
< 6n + 12(g — 1) 



□ 



4.2 Vertices with Few Neighbors 

Theorem 4. For any integer 5 > 0, there exists a constant Mi{g), such that 
any linear hypergraph of genus g with at least an edge of cardinality Mi (g) has 
a vertex v with at most max \E\ + 1 neighbors. 

Proof. If a hypergraph Ti. on n vertices has genus g, there exists a vertex v oiTi. 
with charge ^i(u) < 6 + ^{g — 1). As n > max |if| and as 4 — tends to 4 as 
I if I goes to infinity, if Mi{g) is large enough, the maximum number of neighbors 
of V is achieved for v incident to one edges of cardinality max |if | and two edges 
of cardinality 2 or one edge of cardinality max |if | and one edge of cardinality 3. 
the number of neighbors of u in is hence bounded by max |if | + 1. □ 



4.3 Maximal Cliques 

Lemma 5. For any integer g > 0, there exists a constant Ci{g), such that, for 
any linear hypergraph of genus g on n > Ci{g) vertices, if all the vertices are 
pairwise adjacent, then there is an edge of cardinality at least n—1. 

Proof. As Ft is linear, there exists a vertex v having at most max |if | + 1 neigh- 
bors. Hence, max |if | > n — 2. Assume that the maximal cardinality of an edge 
of is n — 2 and that this value is achieved by an edge E. Let a and b be the 
two vertices of Ft that do not belong to E. As Ft is linear, no edge different from 
E may include (at least) two vertices of E. The same way, at most one edge 
include both a and b and at most one vertex of E. Therefore, by deleting E 
and at most one vertex, we get a 2-uniform linear hypergraph of genus at most 
g which adjacency multigraph is homeomorphic to K^^n-s (plus some parallel 
edges). If Ci{g) is large enough, the complete bipartite graph has genus 

strictly greater than g and we are lead to a contradiction. □ 
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As a direct consequence, we get: 

Theorem 5. For any integer g > 0, there exists a eonstant Ci(g), sueh that any 
linear hypergraph of genus g has a clique size bounded by: 

< max(C/(5), max |i?| + 1) (5) 

Remark 3. A technical easy analysis shows that Ci(0) = 4, that is: any planar 
linear hypergraph Ti which has a clique of size at least 5 has an edge of cardinality 
at least LuifU) — 1. 

4.4 Triangle-Free Linear Hypergraphs 

We shall say that a linear hypergraph is triangle-free if any triplet of mutu- 
ally adjacent vertices is included into an edge. Obviously, a triangle free linear 
hypergraph H satisfies: 

o;(7f) = max |A I (6) 

For such hypergraphs, we may also define a new charge function by: 

e^w = E(3-M) 

E3v ^ 

As a linear hypergraph TL is obviously triangle- free if and only if Incid(Ti) is 
Cg-free, we get, with similar arguments as those used in Lemma Q and Lemma 
El that the sum of the charges of the vertices of a triangle-free linear hypergraph 
Ti with n vertices is bounded by 4n-\-8{g — 1). With similar arguments as those 
used in Theorem [H and Theorem [U we get: 

Theorem 6. For any integer g > 0, there exist a constant M'{g), such that 
any triangle-free linear hypergraph of genus g with at least an edge of cardinality 
M'{g) has a vertex v with at most max|if| neighbors. 

5 Graphs with Given Reduced Genus 

As any multigraph is its own adjacency multigraph, we may define: 

Definition 6. The reduced genus 7(G) of a multigraph G is the minimum genus 
of a hypergraph which adjacency multigraph is G. 

Remark that, if 7(G) denotes the genus of the graph G, 

0 < 7(G) < 7(G) (8) 

For instance, the reduced genus of a triangle-free multigraph is equal to its genus 
and the reduced genus of a complete graph is equal to 0. 

A generalization of the triangle-free simple graphs is the clique partitioned 
graphs: 
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Definition 7. A simple graph G is a clique partitioned graph if no two maximal 
cliques of G have more than one vertex in common. 

Examples of clique partitioned graphs are triangle free graphs and line graphs of 
triangle-free graphs. It is not difficult to check that any clique partitioned graph 
of reduced genus g is the adjacency graph of a triangle-free linear hypergraph of 
genus g. 

Problem 1. For a fixed integer g > 0, what is the complexity of the following 
decision problem: “Given a graph G, decide whether the reduced genus of G is 
at most g”? 

5.1 Coloring 

We first recall some basic definitions on graph coloring. The concept of list 
coloring was introduced by Vizing |M0j and independently by Erdos, Rubin and 
Taylor HH: Assuming that a list L{v) is associated to each vertex u of a graph 
G, a mapping / : E — > U'ugv ^ coloring if / is a proper coloring 

and f{v) G L{v) holds for all u G E. If |T(u)| = k holds for all u G E, L is a 
k-assignment. The choice number ch(G) of G is the smallest k such that every 
^-assignment of G admits a strict list coloring. For ch(7t) < k, G is said to 
be k-choosable. The coloring number col(G) of a graph G is the largest integer 
k such that G has a subgraph with minimum degree k — 1. li oj{G) and x(G) 
denote, as usual, the clique number and the chromatic number of the graph G, 
we have: 

w(G) < x{G) < ch(G) < col(G) (9) 

According to the definition of the coloring number, we have, as a direct 
consequence of Theorem S 

Theorem 7. For any integer g > Q, there exists a constant Hi{g), such that 
any simple graph of reduced genus g has a coloring number bounded by: 

col(G) < ma,x{F[i{g),Lu{G) + 2) (10) 

And, according to Theorem 0 

Theorem 8 . For any integer 5 > 0, there exists a constant F['{g), such that 
any clique partitioned graph of reduced genus g has a coloring number bounded 
by: 

col(G) < max(il'(g),a;(G) -k 1) (11) 

On the other hand, the following results are know on planar graphs: 

— Every planar graph is 4-colorable (Appel and Haken □ Q; Robertson, Sanders 
Seymour and Thomas mm) 

— Every planar graph is 5-choosable (Thomassen j29j ) 

— There exists a non-4-choosable planar graph (A 3-colorable non-4-choosable 
graph has been independently obtained by H. Ben Meeki and H.A. Kierstead 
and by M. Voigt and B. Wirth p2] l 
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— Every triangle-free planar graph is 3-colorable (Grotzsch ^2|) 

— Every triangle-free planar graph is 4-choosable (obvious, as col(G) < 4) 

— There exists a non-3-choosable triangle-free planar graph (Voigt mi 'l 

These results suggest the following problems: 

Problem 2. For any integer g > 0, what is the smallest value C(g) > 0, such 
that the choice number of any simple graph G with reduced genus g is bounded 
by: 

ch(G) < max(G(g),o;(G) -h2) (12) 

In particular, if a simple graph G has reduced genus 0, is G (w(G)-|-2)-choosable? 

Problem 3. Does any simple graph G with reduced genus g have a chromatic 
number bounded by: 



X(G) < max{H{g),uj{G) + 1) (13) 

where H{g) = ^ (7 + + 48g) is Heawood’s function? 

If G is clique partitioned, can the bound be improved as follows? 

X(G) < ma.x{H{g) - l,uj{G)) (14) 

Remark that the bound w(G) -I- 1 would be optimal: 

Proposition 1. For all k > 2, there exists a simple graph G with reduced genus 
0, such that u>{G) = k and x(G) = A: -|- 1. 

Proof. If fc = 2, consider G5. 

Let 7i be the linear planar hypergraph with vertex set 

V = {a, b, c, 02 , . . . , Ofc, ^ 2 , - ■ ■ , fdk-i} 

and edge set £ = {Ei, . . . , E 2 k+i}, where 

El = {a, 02, • • ■ , Q!fc} 

Ej = {5, oj (2<i<k) 

Ek+i = {b, /?2, ■ • • , /3fc-i, ttfc} 

Ek+i = {c,/3i} (2<i<k-l) 

E2k = {c, Ofc} 

£^2fc-i-i = {a, c} 

and let G = Adj(7t) be the adjacency multigraph of H. Obviously, k < x(G) < 
A: -I- 1. Assume x(G) = k and let Color : V [1, . . . , A;] be a proper coloring of 
G. Then, considering the cliques induced by Ei and for 2 < i < k, we get 
Color(a) = Color(5). Similarly, considering Ek+i and Ek+i for 2 < i < A:, we 
get Color(6) = Color(6). Thus, Color(o) = Color(c) and Color is not a proper 
coloring, according to the edge E 2 k+i- Hence, x{G) = k + 1. □ 
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Also, the condition that G is a simple graph (and not a multigraph) is crucial, as 
adding parallel edges allows to drastically decrease the reduced genus. Actually, 
we have: 

Proposition 2. For all k > 1, there exists a multigraph G which is the adja- 
cency graph of a 2k-uniform planar hypergraph Ti., and such that x(G) > |a;(G). 

Proof. Consider the hypergraph Ti with vertex set 



y = {ai,... 


, ak , , • ■ • , , Cl , . . . , Ck , d 


1 


and edge set £ = {Ei, . 


. . , E^}, where 






El = {ai,...,afe,6i,... 


) ^fc} 




E 2 = {bi,...,bk,ci,... 


>Cfc} 




E 3 = {ci , . . . ,Ck,di, . . . 


j dk\ 




El = {di, . . . , dfe, Cl, . . . 


■ >efe} 




E^ = {ci, . . . , Cfe, oi, . . . 


,ak} 



Then, w(G) = 2k. Consider any optimal proper strict coloring of G and let 
Gi, . . . , G5 be the color sets used by vertices ai, ... ,ak, resp. b\,...bk, resp. 
Cl, . . . , Cfe, resp. d\, . . . , dk, resp. ei, . . . , e^. Then, any Ci has cardinality k and 
three of the Gi have an empty intersection. Thus, 

x(G) = ||ja 

i 

= ElC'il-ElC'inG,| 

i i=jtj 

= EiCii-^E|^*n(ijii) 

i i j^i 

>^Ei^*i 

i 

5k 

> — 

- 2 



□ 



5.2 Maximal Cliques 

Computing the clique number of a graph is an NP-complete problem. However, 
we shall prove that, for any fixed g > 0, this problem is polynomial. More 
precisely: 

Theorem 9. For any fixed g > 0, there exists a polynomial algorithm that enu- 
merates all the maximal cliques of any simple graph G with reduced genus g. 
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Proof. Consider a linear hypergraph Ti. of genus g, which adjacency multigraph 
is G. According to Theorem|3 the maximal cliques of G have a simple structure: 
either they are of size at most Ci{g), or all the vertices of the clique (but maybe 
one) belong to a same edge of Ji. Hence, the number of maximal cliques is 
bounded by Pg{n) = + I'^l + ^1^1- As the number of edges \£\ of H is 

at most P{n) is a polynomial function of n. 

Consider the following recursive algorithm computing the set T{G) of the 
maximal cliques of G: 

— If G is empty, T{G) = {0}; 

— Otherwise, T{G) is the set of the maximal sets in 1F(G — u) U {K U {u}, K G 
T{G — v) and K n N{v) ^ 0}, where v is any vertex of G. 

As the number of maximal cliques of a graph of reduced genus is bounded by a 
polynomial function of n and as the deletion of a vertex may not increase the 
reduced genus, the preceding algorithm computes the set of the maximal cliques 
of G in polynomial time. □ 



5.3 Point Arboricity 

Definition 8. The point arboricity p(G) of a simple graph G is the minimum 
number of subsets into which the vertex set of G may be partitioned so that each 
subset induces an acyclic subgraph. 



From P], it follows that: 



' X(G) ' 

2 



< P{G) < 



col(G) 

2 



(15) 



So that, according to Theorem [3 for any integer g > 0, if w(G) is large enough, 
then: 

2 



< p{n) <2 + 



u:{H) 



(16) 



For a planar graph G, it is proved in P] that the bound p(G) < 3 is sharp. 
For a triangle-free planar graph, col(G) = 4 so that p(G) < 2 and the bound 
is obviously sharp. Moreover, according to Proposition QJ for any integer fc > 2, 
there exists a simple graph G of reduced genus 0, such that oj{G) = k and 
x(G) = fc -I- 1. For such a simple graph, we get : 



P{G)> 



co(G) + 1 
2 




(17) 



Thus, if true, the following bound would be sharp. 

Problem 4- For any integer g > 0, does there exist a constant P{g) such that 
any simple graph of reduced genus g and clique number uj{G) > P{g) has a point 
arboricity bounded by: 

^(G) 



p{G) < 1 + 



2 



(18) 
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6 Representations of Hypergraphs 

It appears that a planar graph representation which may be easily extended to 
planar linear hypergraphs is the visibility representation. 

6.1 Visibility Representation 

A useful way of representing planar graphs is the visibility representation: 

Theorem (Rosenstiehl, Tarjan ^1^). A graph is planar if and only if it 
has a visibility representation, that is a representation in the plane in which the 
vertices are represented by horizontal line segments, the edges are represented by 
vertical line segments, and such that: 

— no two horizontal (resp. vertical) segments intersect, 

— no two segment cross, 

— an edge is incident to a vertex if and only if the corresponding segments 
touch each other. 

This theorem has been extended to toroidal visibility representations (see jZD|)- 
This representation may be extended to planar hypergraphs and (up to a 
reformulation of the original theorem) we get: 

Theorem (de Fraysseix, Ossona de Mendez, Pach A hypergraph is 

planar if and only if it has a “visibility representation” , that is a representation 
in which the vertices are represented by horizontal line segments, the edges are 
represented by vertical line segments, and such that: 

— no two horizontal (resp. vertical) segments intersect, 

— no two segment cross, 

— an edge is incident to a vertex if and only if the corresponding segments 
touch each other. 

6.2 Straight Line Representation 

It is a classical result independently established by Wagner Fary and 
Stein [l2t)| (which is also a consequence of the Steinitz’s theorem on convex poly- 
topes m) that any simple planar graph has a straight line representation in the 
plane. 

Problem 5. Has any planar linear hypergraph a straight line representation, that 
is a representation in which: 

— vertices are represented by pairwise distinct points, 

— edges are represented by pairwise non-overlapping segments, 

— a vertex v is incident to an edge E if and only if the point representing v 
belongs to the segment representing E. 

If we relax the “straight line” condition is the following true? 
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Problem 6. Has any hypergraph of genus g a representation on a surface of genus 
g in which: 

— vertices are represented by pairwise distinct points, 

— edges are represented by pairwise non-overlapping arcs, 

— a vertex v is incident to an edge E if and only if the point representing v 
belongs to the arc representing E. 

In a “dual” form (dual in the sense of dual hypergraphs, not in the sense of 
algebraic or planar duality), Problem|S]may be restated as follows: 

Problem 1. Is any simple graph of reduced genus 0 the intersection graph of a 
family of straight line segments in the plane? 

This problem extends the following conjecture on planar graphs: 

Conjecture 1. Any simple planar graph is the intersection graph of a family of 
straight line segments in the plane. 

A weaker conjecture may be stated on planar multigraphs (it has been partially 
solved in and some advancement on the “stretching” problem of the arcs may 
be found in | 7 )): 

Conjecture 2. Any planar multigraph is the intersection multigraph of a family 
of arcs. 

Actually, Problem E| is an extension of this problem, as it can be restated as 
follows: 

Problem 8. Is any multigraph of reduced genus g the intersection multigraph of 
a family of arcs on a surface of genus gl 

6.3 Representation by Contact of Triangles 

Other representation may probably be extended to planar linear hypergraphs. 
For instance, the representation by contact of triangles: 

Theorem Any planar graph is a contact graph of triangles. 

Problem 9. Is any planar linear hypergraph the contact hypergraph of triangles 
in the plane? 
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Abstract. This paper surveys recent results on the classification of dis- 
crete temporal properties, gives an introduction to the methods that 
have been developed to obtain them, and explains the connections to 
the theory of finite automata, the theory of finite semigroups, and to 
first-order logic. 



The salient features of temporal logi({i|are its modalities, which allow it to express 
temporal relationships. So it is only natural to investigate how and how much 
each individual modality contributes to the expressive power of temporal logic. 
One would like to be able to answer questions like: Can a given property be 
expressed without using the modality “next” ? What properties can be expressed 
using formulas where the nesting depth in the modality “until” is at most 2? 

This survey reports on recent progress on answering such questions, present- 
ing results from the papers 0, 0, PI, PI, and m and the thesis PI- 

The results fall into three categories: (A) characterizations of fragments of 
future temporal logic, where a fragment is determined by which future modali- 
ties (modalities referring to the future and present only) are allowed in building 
formulas; (B) characterizations of symmetric fragments, where with each modal- 
ity its symmetric past/future counterpart is allowed; (C) characterization of the 
levels of the until hierarchy, where the nesting depth in the “until” modality 
required to express a property in future temporal logic determines its level. 

An almost complete account of the results from category (A) will be given in 
Sections 0 through E] including full proofs. These results can be obtained with a 
reasonable effort in an automata-theoretic framework and the methods used to 
obtain them are fundamental to the whole subject, in particular, to the results 
from categories (B) and (C). The results from these two categories are presented 
in Sections 0 and El without going into details of the proofs, which would require 
a thorough background in finite semigroup theory. 

In computer science applications, temporal formulas are interpreted in (finite 
or infinite) sequences (colored discrete linear orderings), which are nothing else 
than words (strings or w-words). Therefore, the set of models of a temporal 
formula — the property defined by it — can be viewed as a formal language, in 

* Part of the research reported here was conducted while the author was postdoc at 
DIMACS as part of the Special Year on Logic and Algorithms. 

^ I use “temporal logic” as a synonym for “propositional linear-time temporal logic.” 
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fact, a regular language. In other words, characterizing a fragment of temporal 
logic amounts to characterizing a certain class of regular languages. 

There is a long tradition of classifying regular languages, going back to as 
early as 1965, when Schiitzenberger in the seminal paper of the field, H2|, charac- 
terized the star-free languages as being exactly the ones whose minimal DFA’s are 
counter-free. Given that temporal logic and star-free expressions have the same 
expressive power (which was only realized much later [51411 l)j l. Schiitzenberger’s 
result also marked the first step in classifying discrete temporal properties: it 
gave an effective characterization of the class of all regular languages expressible 
in temporal logic. After an introductory section with terminology and notation, 
this survey starts off in Section El with a new, brief proof that every language 
recognized by a counter-free DFA is expressible in future temporal logic. 

This paper only deals with strings, but most of the results have been extended 
to w-words. The reader is referred to the respective original papers. 



1 Basic Terminology and Notation 

We interpret temporal formulas in strings and use standard notation with regard 
to strings. The positions of a string of length n are indexed by 0, . . . ,n — 1. 
When u is a string of length n and 0 < i < j < n, then u{i,j) denotes the string 
u{i)u{i -I- 1) . . . u{j — 1). Further, u{i, *) denotes the suffix u{i, n). 

A temporal formula over some alphabet S is built from the logical constants 
T (true) and T (false) and the elements of S using the boolean connectives ^ 
(negation), A (conjunction), and V (disjunction) and the temporal modalities X 
(next), F (eventually), and U (until). All connectives and modalities are unary 
except for A, V, and U, which are binary and written in infix notation. The set 
of all temporal formulas is denoted by TL. 

A fragment of temporal logic is a subset of TL obtained by allowing only 
the use of certain temporal modalities in the construction of formulas. When I 
is a list of temporal modalities, then TL[^] denotes the respective fragment. For 
instance, TL[F] stands for the class of all temporal formulas which can be built 
from alphabet symbols and the logical constants using boolean connectives and 
F as the only temporal modality. 

Given a temporal formula ip and a string u, one defines what it means for ip 
to hold in u, denoted u \= ip. This definition is inductive, where, in particular, 

— for every symbol a, it ^ o if u(0) = a, 

— u 1= Xip if |u| > 1 and u(l, *) |= ip, 

— It 1= Ft/? if there exists i with 0 < i < |it| such that u{i, *) ^ ip, and 

— u \= ip \Jtp if there exists i with 0 < i < |it| such that u{j,*) \= <p for every 
j G {1, . . . , 1 — 1} and u{i, *) |= if. 

Note that T U t/? has the same meaning as Xip for any temporal formula ip, 
and T U t/? has the same meaning as Ftp, which means X and F can be derived 
from U. Sometimes, we will also use the temporal modality G (always), which is 
another derived modality: it stands for ^F^. 
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The two modalities F and U have so-called stutter-invariant counterparts 
(for an explanation of the terminology, see Section 0), denoted Fjf and Usf, 
respectively. Their meaning is defined just as above except that i is allowed to 
be 0 and 0 must also be considered for j. In this regard, the modalities X, F, and 
U will be referred to as strict modalities. 

Given a temporal formula ip over some alphabet S and an alphabet F, we 
write Cr(p) for the set {u G T+ \u\= p} and say Cr{p) is the language over F 
defined by p. (Observe that if F is an arbitrary alphabet, p an arbitrary formula, 
and Ip the formula obtained from p by replacing every alphabet symbol not from 
F by T, then Cr{p) = Cr(dP). This means one can always assume that a defining 
formula only uses symbols from the alphabet of the language in question.) A 
language is said to be expressible in temporal logic (or TL-expressible) if there is 
a temporal formula that defines it. Similarly, when is a fragment of temporal 
logic, a language is expressible in F if there exists a formula in F that defines 
it. 

A deterministic finite automaton (DFA ^ is a tuple A = (A, Q, qj, S, F) where 
A is a finite alphabet, Q a finite set of states, qj G Q the initial state, 6 : Qx A 
Q the transition function, and F C Q the set of final states. The extended 
transition function of A, denoted (5*, is defined by 6*{q,e) = q for q G Q and 
S*{q, ua) = S(S*(q, u), a) for q G Q, u G A*, and a € A. The language recognized 
by A, denoted F(A), is defined by £(A) = {u G A+ | 6*{qi,u) G F}. Given a 
regular language L, the minimal DFA for L is denoted by A^. 

When u denotes a string, then denotes the reverse of u, i.e., if u is of 
length n, then = u{n — l)u{n — 2) . . .it(0). Accordingly, when L denotes a 
language, then LP denotes the reverse of L, i. e., the language {uP \ u G L}. 

2 Full Temporal Logic 

It is easy to see that every language expressible in temporal logic is a regular 
language, i.e., recognizable by a DFA. This raises the question what regular 
languages are exactly the ones that are expressible in temporal logic. Recall 
that the minimal DFA recognizing a given regular language is a canonical object 
to consider when one is interested in classifying a regular language. So more 
concretely, one can ask for a structural property of DFA’s that is enjoyed by the 
minimal DFA of a given regular language if and only if the language is expressible 
in temporal logic. 

The adequate property is known as counter-freeness. Given a DFA A, a 
sequence qo, . . . , qm-i of distinct states is a counter for a string u if m > 1 and 
5*{qi,u) = qi+i for i < TO where, by convention, q^ = qo- A DFA is counter-free 
if it does not have a counter. 

Theorem 1. nnsi A regular language L is expressible in TL if and only if Al 
is counter-free. 

This theorem is a simple consequence of two fundamental results: in 1971, 
McNaughton and Papert nm proved that counter-free DFA’s recognize exactly 



Classifying Discrete Temporal Properties 



35 



the languages that are expressible in first-order logic; in 1980, Gabbay, Pnueli, 
Shelah, and Stavi ^ showed that temporal logic is as expressive as first-order 
logicfl The latter result is an improvement of a result of Kamp ^ from 1968 that 
says that temporal logic with future as well as past operators is as expressive as 
first-order logic in Dedekind-complete orderings. 

The difficult implication in Theorem Q is the one that asserts that a regular 
language L is expressible in temporal logic if is counter-free. For this part 
of the theorem only a few direct proofs have been presented thus far. There 
is a journal paper by Cohen, Perrin, and Pin [Q, Maler’s thesis jSj, and an 
accompanying conference paper by Maler and Pnueli |3. Cohen et al. as well as 
Maler and Pnueli use some kind of decomposition theory (for finite semigroups or 
for finite automata); the proof presented below, from ca, avoids such theories. 

We need more terminology and notation. A pre- automaton is a triple {S, Q, S) 
where H is a finite alphabet, Q a finite set of states, and S: Q x S ^ Q a 
transition function. In other words, a pre-automaton is a DFA without initial 
and final states. The terminology and notation we have introduced for DFA’s 
transfers to pre-automata in a straightforward way (if applicable) . For instance, 
the extended transition function of a pre-automaton and the property of being 
counter-free are defined in exactly the same way as for DFA’s. 

Given a set Q, we view the set of all functions on Q as a finite semigroup 
with composition as product operation. Given Q ^ Q, 'we write af3 for 

the composition of a and (3, i.e., for the function given by g i-^- j3{a.{q)). For 
a: Q ^ Q and Q' C Q, we write a[Q'\ for the image of Q' under a, i.e., for 
(a(q) I q G Q'}. 

Let A — (U, Q, S) be a pre-automaton. For every string u G S* we define its 
transformation, denoted u^, as follows. For every q G Q we set u^{q) = S*{q, u), 
and we let | m G T'+}. Clearly, this set is closed under functional 

composition, that is, it is a subsemigroup of . It is called the transformation 
semigroup of A. For every a: Q ^ Q, we set = {u G \ = a}. Further, 

denotes U{e} if a = idg and else L^. — Observe that if a pre-automaton 
as above is counter-free and u is a string such that u^[Q] = Q, then = id q. 

Proof of Theorem from a counter-free DFA to a temporal formula, 

We prove that for every pre-automaton A = (E,Q,S) and every a G Sa 
the language is expressible in temporal logic, which is obviously enough. The 
proof goes by induction on \Q\ in the first place and then on lifl: in the induction 
step, we will consider pre-automata with the same state space but over a smaller 
alphabet as well as pre-automata with a smaller state space but over a much 
larger alphabet. 

We distinguish two cases. First, assume there is no symbol a G S such that 
S Q- Then = idq for every a G S, which means Sa = {*c^q}- This 
implies for every a G Sa, and is obviously expressible in temporal 

logic. 



^ In the authors interpreted temporal logic and first-order logic in oi- words. It is, 
however, obvious that their result is also valid for strings. 
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Second, assume b G S is such that b^[Q] C Q. Let Q' = b^[Q], F = S \ 
{6}, and let B be the pre-automaton which results from A by restricting it 
to the symbols from F. Further, let Uq = F*b, A — {u^ \ u G Uq}, and set 
C — (A,Q',S') where 6'{q,a) = a{q) for every q G Q' and a G A. Finally, 
let h: Uq Z\+ be the function defined by h{uo . . .Un-i) = for 

■ ■ ■ , Un—1 G Uq. 

Let a G Sa- We want to show that is TL-expressible. To this end, we 
first partition according to how many b's occur in a string; we set 

Lo = L^C]F+ , Li = L^f] F*bF* , L 2 = n F*bS*bF* . 

Then = LqU LiU L 2 - Next, we observe that 

Lo = if , Li= U ifbEf, , i 2 = U Lfbh-\L^)Lf, , 

OL—^b^ (3' 

where /3, /?' G S'b U {idg}, and 7 G Sc- Further, we see that 

= L^bE* n F*bL^,, Lf 3 ,.y,f 3 ' = L^bE* n F*bh~^{L^)F* n E*bL%, ( 1 ) 
for 13,13' G Sb^ {idq}, and 7 G Sc- 

By induction hypothesis, we know that all Lf with [3 G Sb and all Lf 
with 7 G Sc are TL-expressible. It is now a manageable “programming task” to 
show that under these assumptions all the sets that are intersected on the right- 
hand sides of the equations in m are TL-expressible, which means Lf is TL- 
expressible, as temporal logic is closed under disjunction (union) and conjunction 
(union). Lemmas Q and |3 below provide the details. □ 

Lemma 1. Let E be an alphabet, b G E, and F = i7\{6}. Assume L C E^ and 
L' C T+ are TL-expressihle- Then so are F*bL, F*b{L + e), E*bL', E*b{L' + e), 
L'bE*, and {L' + F)hE* . 

Proof- First, let cp and if be formulas over E and T, respectively, such that 
Ae{t) = L and Cr{if) = L' . Then 

F*bL^ CshbUs^{bAXip)) , E*bL' = Csi^sfib A G^b LXif)) . 

The defining formulas for F*b{L + e) and E*b{L' + e) can be obtained in a similar 
fashion. 

Second, we show by induction that for every temporal formula ip over F there 
exists a temporal formula ip~^ such that = Ar{p’)bE* . We can simply 

set 



a'*' = a A F6 , 

{ip A ijj)~^ = A ijj~^ , 



(-i(^)+ = -n(p+ A^b Afb , 

{(p U ijj)~^ = {(p~^ A ~^b) U ('i/>^ A ~^b) , 



where a stands for an arbitrary element of F . 
Clearly, Ce{p+ V 5) = {Cr{pf) + e)bE*. 



□ 
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Lemma 2. Let S , A he alphabets, b G S , F = S \ {b}, and Uq = F*b. Fur- 
ther, let ho: Uo ^ A be an arbitrary function and h : Uq A~^ be defined by 
h{uo ■ ■ ■ Un-i) = ho{uo) . . . ho{un-i) for uo, . . . ,M„_i G Uq. For every d G A, let 
Ld = {u G F~^ I ho{ub) = d}. Assume L C is expressible in temporal logic 
and also Ld for every d G A. Then h~^{L)F* is expressible in temporal logic. 

Proof. We show by induction that for every temporal formula g? over A there 
exists a temporal formula over E such that h~^{CA{p>))r* = For 

dGA,we either have h~^{CA{d))F* = LdbS* or h~^{CA{d))F* = {Ld~\- e)bS* . 
Thus, the induction basis follows from the previous lemma and the assumption 
that the languages Ld are TL-expressible. For the induction step, we can set 

A Fsf6 , {if A A , 

{(p U = '0^ V A {b ^ U (6 A X0^)) . □ 

The above proofs are constructive, i. e., following these proofs one can actu- 
ally construct a temporal formula defining the language recognized by a given 
counter- free automaton. A closer analysis of the constructions sketched in the 
proofs yields the following quantitative statement. (Recall that for every pre- 
automaton with n states, the cardinality of its transformation semigroup is 

20(nlogn) ^ 

Corollary 1. For every counter-free DFA with at most n states and at most 
m symbols in the alphabet, there exists a temporal formula of size m 2^ * 

which defines the language recognized by the DFA. 



3 Strict Fragments 

The three basic temporal modalities are X, F, and U. So if we determine frag- 
ments of TL by disallowing the use of some of these modalities we obtain eight 
different fragments. Obviously, some of these have the same expressive power. 
For instance, the modality X as well as the modality F can be expressed using U 
only. Thus, all fragments that allow U have the expressive power of full temporal 
logic: 



TL[U] = TL[X, U] = TL[F, U] = TL[X, F, U] = TL . (2) 

By abuse of notation we use an expression like TL[X, U] to refer to the specific 
fragment of TL as well as to the class of languages expressible in this fragment. 

The identities in 0 are the only ones that hold: TL[X] and TL[F] are in- 
comparable in terms of expressive power and both are stronger than TL[] and 
weaker than TL[X, F], which in turn is weaker than full temporal logic. 

The aim of this section is to provide structural properties that exactly charac- 
terize each of these fragments, just as counter-freeness characterizes expressibility 
in full temporal logic. 
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3.1 Forbidden Patterns 

We need a convenient way to describe structural properties of DFA’s and there- 

mRI 

fore borrow the notion of “forbidden pattern” from Cohen, Perrin, and Pin jl] |f| 
For brevity in notation, given a transition function 5\ Q x E ^ Q , -we define 
a product Q x S* ^ Q hy setting qu = 6*{q,u) for g G Q and u G S* . Given 
a set N, an N -labeled digraph is a tuple (V, E) where V is an arbitrary set and 
E a subset of P x N x V. The transition graph of a DFA A = {S,Q,qi,S,F) 
is the A+-labeled digraph {Q,E) where E = {{q,u,qu) \ q G Q and u G E~^}. 
So the transition graph of any DFA is an infinite graph. (It has infinitely many 
edges, but only finitely many vertices.) 

A pattern is a labeled digraph whose vertices are state variables, usually 
denoted p, q, . . . , and whose edges are labeled with variables for labels of two 
different types: variables for nonempty strings, usually denoted u, v, . . . , and 
variables for symbols, usually denoted a,b, . . . In addition, a pattern comes with 
side eonditions stating which state variables are to be interpreted by distinct 
states. We will draw patterns just as we draw graphs. Consider, for instance. 
Figure E In this figure, as well as in all subsequent figures depicting patterns, 
we adopt the convention that all states drawn solid must be distinct. 

We say a A^-labeled digraph matches a pattern if there is an assignment 
to the variables obeying the type constraints and the side conditions so that 
the digraph obtained by replacing each variable by the value assigned to it is a 
subgraph of the given digraph. 



3.2 Classification Theorem 



Using the notion of a forbidden pattern, we can now characterize all fragments: 



Theorem 2. II1UI4I1I3I11I Let L be a regular language and F one of the frag- 
ments TL[], TL[X], TL[F], TL[X, F], or TL. Then L is expressible in F if and 
only if the transition graph of A^p does not match the pattern(s) for F depicted 
in Fzgrtres nid 



Observe that in Figures Hand El the connected graphs are viewed as different 
patterns (any of which must not occur), whereas Figure Ejshows only one pattern, 
which happens to be not connected. 

The characterizations given in Theorem El for TL[] and TL[X] are easy to 
obtain; the characterization for TL is correct because of Theorem Q The charac- 
terization for TL[X, F] was first obtained by Cohen et al. p. An alternative proof 
and a characterization for TL[F] were given in 0, using the same technique for 
both fragments. In the following two subsections, this technique is demonstrated. 

® To be precise, what is called a “forbidden pattern” here is referred to as a “forbidden 
configuration” by Cohen et al. 
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Fig. 1. Patterns forbidden for TL[] 
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Fig. 6. Patterns forbidden for TL 





3.3 Ehrenfeucht-Frai’sse Games 

Ehrenfeucht-Frai'sse (EF) games are a standard tool in mathematical logic to 
tackle questions about the expressive power of a logic. They allow one to reduce 
such questions to questions about the existence of strategies in specific two- 
player games, abstract away syntactical peculiarities, and thus represent the 
combinatorial core of the problems. In our situation, we will use specifically 
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tailored EF games to prove correct the characterizations for TL[F] (and TL[X, F]) 
given in Theorem Q. 

An EF game for TL[F] is played by two players, Spoiler (male) and Duplicator 
(female), on a pair of nonempty strings and proceeds in several rounds. The 
number of rounds to be played is fixed in advance. In each round, a prefix of 
each of the two strings is chopped off according to a rule explained below so that 
the outcome of a round is a new pair of strings or an early win for one of the 
players if the other cannot act according to the rule. Before each round and after 
the last round, a referee checks if the two strings start with the same symbol. 
If this is not the case, the referee calls Spoiler the winner of the game. If after 
the last round Spoiler has not yet won the game. Duplicator is announced the 
winner. The rule for carrying out a round is as follows. First, Spoiler replaces one 
of the two strings by a proper, nonempty suffix of it. Then Duplicator replaces 
the other string by a proper, nonempty suffix of it. If Spoiler cannot follow this 
rule because both strings have no proper, nonempty suffix (i. e., if both strings 
are of length 1), he looses, and if Duplicator cannot reply according to the rules 
because the other string is of length 1, then Spoiler wins. 

The idea behind the game is that Spoiler tries to exhibit a difference between 
the two strings the game starts with whereas Duplicator tries to show they are 
similar. This can also be phrased in a formal way: Spoiler has a winning strategy 
in a fc-round game if and only if there is a formula ip of “F depth” at most k 
that holds for one of the two strings but not for the other. The theorem that we 
will use is the following. 

Theorem 3. Let L he a language. Then L is expressible in TL[F] if and only 
if there exists a number k such that for every pair (u, v) with u G L and v ^ L, 
Spoiler has a winning strategy in the k-round game on (u,v). 



3.4 Characterization of TL[F] 

The claim that a language L is expressible in TL[F] if and only if the transition 
graph of Alp does not match the pattern depicted in Figure E| follows directly 
from Lemmas El and E] below. 

Lemma 3. Let L be a regular language such that the transition graph of Alp 
matches the pattern depicted in Figure^ Then L is not expressible in TL[F]. 

Proof. Let Alp = (Tl, Q, <7/, F) and assume a, u, and v are chosen so that the 
pattern in Figure 0 is matched. By minimality of Alp, there exist x,y G S* 
such that x{uvYuay G L^ iff x{uvYay ^ L^, for every I > 0. We show that for 
I > k > 0 and any choice of strings x,y G S* , u,v G A+, Duplicator wins the 
fc-round game on {x{uvYuayY and {x{uvY ayY . Thus, by TheoremEl L cannot 
be expressible in TL[F]. 

First of all, observe that playing on the first \ay \ positions of the two strings 
does not help Spoiler to win the game: Duplicator will simply copy Spoiler’s 
moves. It is therefore sufficient to show that Duplicator wins the fc-round game 
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on {x{uvY'^^u'Y and {xiuvYu'Y ior l>k>Q and any choice of strings x S if*, 
u,u' ,v C if'*" where u' is a prefix of u. 

The proof of this claim is by induction on k. The induction base, k = 0, is 
trivial. For the inductive step, assume k > 0. Write s and t for {x{uvY'^^u')^ 
and {x{uvYu'Y . First, suppose Spoiler removes a prefix of length i from t. Then 
Duplicator replies by removing a prefix of length i + \uv\ from s, and the remain- 
ing strings will be identical. Second, assume Spoiler removes a prefix from s, say 
of length i. Hi > |ut)|, then Duplicator removes the prefix of length i — 
from t, and the remaining strings will be identical. If z < |Mt>|, then Duplicator 
removes the prefix of length i from t, and the induction hypothesis applies for 
the following reason. The remaining strings are {x{uvY~^^u")^ and {x{uvYu”Y 
with u” € a prefix of u, or {xu{vuYv'Y and {xu{vuY~^v'Y with v' a prefix 
of u, or {x{uvYu”Y and {x{uvY~^u”Y with u” € a prefix of u. □ 

For the other direction we need some more notation and terminology. First, 
we write SCC(( 7 ) for the strongly connected eomponent of a node g in a given 
digraph. Second, given a DFA A = {S,Q,qi,S,F) and a string u G A*, the 
rank of u (with respect to A), denoted rk(u), is the cardinality of the set 
{SCC(g/ zi(0, 0)), . . . , SCC(g/ u(0, |u| - 2))}. 

Lemma 4. Let A he a DFA over some alphabet E whose transition graph does 
not match the pattern depicted in Figure^ Then C{A)p is expressible in TL[F]. 

Proof. We prove that if u and v are nonempty strings over E such that qiu 
qi V, then Spoiler wins the (rk(M)-|-rk(z;))-round game on and v^, by induction 
on rk(it) -I- rk(u). 

Write u = u'a and v = v'b for appropriate o, 6 G if. If a 6, then Spoiler 
wins immediately. So in the rest, assume a = b. Write p and q for qj u' and qi v' . 
Clearly, SCC(p) Y SCC(g) in the transition graph of A, because otherwise it 
would match the pattern depicted in Figure E] There are three situations that 
we distinguish. 

1. Neither SCC(p) is reachable from SCC(g) nor vice versa. 

2. SCC(p) is reachable from SCC(g), but SCC(g) is not reachable from SCC(p). 

3. The same as 2. with the roles of p and q exchanged. 

First, assume we are in situation 1. Then it is not possible that qj belongs to 
both SCC(p) and SCC(g), say it does not belong to SCC(p). Let i be minimal 
such that qiu{0,i) G SCC(p) and set p' = qiu{0,i)- Spoiler replaces by 
u{0,iY- Duplicator either looses immediately (because v is of length 1) or she 
replies by removing a prefix of v^, say she replaces by v(0,jY- — 

qi v{0,j)- If we had p' = q' , then SCC(g) would be reachable from SCC(p) — a 
contradiction. Hence, p' Y q' ■ By Hi® minimality of z, we also have SCC(g/ zz(0, z— 
1)) Y SCC(p), which means rk(zz(0,z)) < rk(zz) and, in particular, rk(zz(0,z)) -I- 
rk(f(0, j)) < rk(zz) -|-rk(z;), so that the induction hypothesis applies. Spoiler wins 
the remaining game with one round less. 
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Second, assume we are in situation 2. Choose i as above. Spoiler does the 
same as before. Duplicator either looses immediately or she removes a prefix 
from say she replaces by u(0,j)^. If we had qiu{0,i) = qiv{0,j), then 

SCC(g) would be reachable from SCC(p) — a contradiction. Just as before, we 
can apply the induction hypothesis. Situation 3 is symmetric to situation 2. □ 

Exactly the same technique works for proving the correctness of the charac- 
terization of TL[X, F]. In EF games for this fragment, the additional temporal 
modality is accounted for by an additional type of round, so-called X rounds. In 
such a round. Spoiler first chops off the first symbol of one the two strings and 
Duplicator then chops off the first symbol of the other string. For details, see j^. 

4 Stutter-Invariant Fragments 

In Section Ewe have defined the so-called stutter-invariant counterparts of F and 
U, namely Fjf and Usf. In this section, we will obtain effective characterizations 
for the stutter-invariant fraqments, TLlFsfl and TLfUsfl. (Observe that TLfUsfl = 
TL[Fsf, Usf] and TL[X, Fsf] = TL[X, F].) 

Strings u and v are stutter- equivalent if they both belong to a language of the 
form . . .a'^ for some k and appropriate symbols Of. We use =st to denote 
stutter equivalence, and it is easy to see that =st is in fact an equivalence relation. 
A language is stutter-invariant if whenever u and v are stutter-equivalent strings, 
then either u and v belong to this language or u and v do not belong to it, i. e., 
if this language is a union of stutter equivalence classes. 

Lamport |7] observed that every language expressible in TL[Fsf , Usf] is stutter- 
invariant. This explains why Fsf and Usf are called stutter-invariant. Below, we 
prove that the converse of Lamport’s observation holds true as well, in the fol- 
lowing sense. 

Theorem 4. |3 117] Let F be one of the stutter- invariant fragments TL[Fsf] and 
TL[Usf] and let F' be its strict counterpart, TL[F] respectively TL[U]. Assume L 
is an arbitrary language. Then L is expressible in F if and only if L is expressible 
in F' and stutter-invariant. 

Observe that a regular language L is stutter-invariant if and only if the 
transition graph of A^p (or, equivalently, of A^) does not match the pattern 
depicted in Figure 0 Thus, the above theorem (together with the classification 
theorem from the previous section) immediately leads to characterizations of 
TL[Fsf] and TL[Usf] in terms of forbidden configurations. 

Using the characterization results we have obtained so far, one can prove: 

Corollary 2. For every fragment (strict or stutter-invariant) F of temporal 
logic, the following problem is PSPACE-complete. Given a temporal formula ip, 
decide whether ip is equivalent to a formula in F? 

The upper bound follows from the fact that in polynomial time one can check 
whether or not the transition graph of a DFA matches a fixed pattern. The lower 
bound is obtained by a reduction to TL satisfiability. 
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The proof of TheoremEI makes use of the notion of a stutter-free string, which 
is defined as follows. A string u is stutter-free if u{i) ^ for all i < \u\ — 1. 

Clearly, every equivalence class of =st contains exactly one stutter-free string. 
As a consequence of Lamport’s observation, we note: 

Lemma 5. Let L he a stutter-invariant language over some alphabet S and 
ip G TL[Fsf, Usf] a formula over E such that u\= p iff u G L, for u G E~^ stutter- 
free. Then p defines L. □ 

So Theorem E] will follow once we have established the following lemma. 

Lemma 6. Let F and F' he as in Theorem^ and assume p G F' . Then there 
exists p' G F such that u\= p iff u\= p' , for u G A+ stutter-free. 

Proof. The proof is an inductive definition of p' , which works in both situations. 
The base case is trivial. In the induction step, negation and disjunction can be 
dealt with easily. What remains are formulas whose outermost connective is F 
or U. We set 

f V (aAFsf(6AFsfV'')) , 

/ J a.,b^U: a^b 

I V (a A (a Usf (5 a (V'' Usf x')))) : 

a.,b^U : a^b 

We prove only that the second choice is correct; the proof that the first choice 
is correct is even simpler. First, assume u \= p. Then there exists z > 0 such 
that u(i, *) H X u{j, ^ ■i/' for j S {1, . . . , z — 1}. By induction hypothesis, 
this means u(i, *) |= x' and u{j, *) |= z/>' for j € {1, . . . , z — 1}. Clearly, we have 
u ^ zz(0) A zz(0) Usf (zz(l) A ft' Usf xOi which is a disjunct of p' . 

Second, assume u\= p' and let a and b be symbols for which the correspond- 
ing disjunct holds. If zz ^ a A a Usf (6 A z/>' Usf xOi then zz(0) = a and zz(I) = b, 
since u is assumed to be stutter-free. But then zz(I, *) \= tp' Usf x^ which implies, 
by induction hypothesis, zz(l, *) \= ip Usf x? which, in turn, implies u\= ip\}x- C 

This completes the first part of this survey. We have seen how every frag- 
ment (determined by which modalities are allowed in forming formulas) of future 
temporal logic can be characterized in an effective, concise way by describing 
structural properties of DFA’s. 



for p = Vip, 
ior p = Ip X. 



5 Past Modalities and Symmetric Fragments 

Thus far, we have only dealt with temporal modalities that refer to the future 
(and possibly the present) only. But, of course, each of the modalities considered 
has a symmetric past counterpart: S (since) goes with U, P (eventually in the 
past) goes with F, Y (previously) goes with X. 

Adding past modalities does not increase the expressive power of temporal 
logic, i. e., TL = TL[U, S]. This is easy to see because for every temporal formula 



44 



Thomas Wilke 



(with future and past modalities) one can still find a counter-free DFA rec- 
ognizing the language defined by the formula. Similarly, TL[Usf] = TL[Usf,S5f], 
because even with past stutter-invariant modalities one can only express stutter- 
invariant languages. Clearly, TL[X] = TL[X, Y]. But the expressive power of any 
other fragment is increased by adding the corresponding past modalities. Nev- 
ertheless, we have: 

Theorem 5 (Decidability of Symmetric Fragments [!16j l. For each of the 
fragments TL[Fsf, Psf], TL[F, P], and TL[X, Y, F, P] it is decidable whether or not 
a given temporal property can be expressed in it. 

This theorem is based on similar structural characterizations as the ones 
given in Theorem |2| for the future fragments of temporal logic. There is, how- 
ever, a fundamental difference. Instead of looking at the minimal DFA for a 
given language, one considers its syntactic semigroup, which, by definition, is 
symmetric in the sense that the syntactic semigroup of the reverse of a lan- 
guage is the reverse of the syntactic semigroup of the language, and thus better 
suited for investigating symmetric fragments. The proofs get more involved and 
require non-trivial finite semigroup theory. On the other hand, they also reveal 
interesting connections to first-order logic. 

Remember that Kamp’s theorem says that temporal logic (with future modal- 
ities only or with both) is as expressive as first-order logic. In this statement, a 
string u € of length n is viewed as a structure in the signature with a binary 
predicate <, for the order relation on the positions, and unary predicates Pa, for 
each alphabet symbol a a separate predicate. 

A simple induction shows that every temporal formula is equivalent to a first- 
order formula, even to a first-order formula that uses at most three variables. 
In view of Kamp’s theorem, this means that temporal logic and first-order logic 
with three variables have the same expressive power. Reducing the number of 
variables to two leads to TL[F, P] and TL[X, Y, F, P], respectively: 

Theorem 6 (Kamp’s Theorem for Smaller Fragments fllij). 

1. A language is expressible in TL[F, P] if and only if it is expressible in first- 
order logic with two variables. 

2. A language is expressible in TL[X,Y, F, P] if and only if it is expressible in 
first-order logic with two variables when the signature is extended by the built-in 
predicate sue for successor. 

There are more connections to first-order logic and to formal language the- 
ory. First, the languages expressible in TL[F, P] are exactly the unambiguous 
languages in the sense of Schiitzenberger uni. Second, the languages expressible 
in TL[F, P] and TL[X, Y, F, P] are exactly the languages expressible by a E 2 and, 
at same time, a II 2 formula (over the respective signature). For details, see [TTlj . 

6 Until Hierarchy 

Which temporal modalities are needed to express a given temporal property is 
the first question to ask when one is interested in studying the expressive power 
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of the temporal modalities themselves, but there are other, equally important 
ones, and some prominent ones are concerned with the “until hierarchy” of future 
temporal logic. The “until” modality is special in several respects. First, it is 
complete in the sense that no other modality is needed to express all temporal 
properties. Second, it is the only binary modality. The last fact is crucial; it 
makes formulas hard to read, especially, when nesting occurs. So the question is 
whether or not nesting of “until” is necessary, even when the other modalities can 
be used for free 0 Using an appropriate Ehrenfeucht-Fraisse game with additional 
types of rounds corresponding to X and U, one can actually show that the more 
nesting is allowed, the more one can express: 

Theorem 7 (Strictness of Until Hierarchy [3j). Let E = {a, b, c} be a three- 
element alphabet and define (fn, n > 0, by ipo = a and tpn+i = a A X(6 U (pn)- 
Then fipn is of until nesting depth n, but CsiPn) is not definable by a formula 
of until nesting depth < n. 

We even have: 

Theorem 8 (Computability of Until Depth HI)- Given a temporal for- 
mula ip, one can compute the minimal until nesting depth required to express the 
language defined by ip. 

The proof of this theorem, just as the proof of Theorem El makes heavy 
use of finite semigroup theory. A key ingredient of the proof is the so-called 
semidirect product/substitution principle, which, in rough outline, states that if 
two fragments of temporal logic, say F and G, are characterized by classes V 
and W of finite semigroups, then the fragment which is obtained by substituting 
formulas of G into formulas of F is characterized by the semidirect product of 
V and W. Applied to the until hierarchy, this principle says that the k-th level 
of the hierarchy is characterized by a k-th power of the class of semigroups that 
characterizes level 1. (Observe that a formula of until depth at most k can be 
written as a /c-fold substitution of formulas of depth at most 1, and vice versa.) 
For details, see m or m 

7 Conclusion 

The results presented in this survey show that there are intimate connections 
between temporal logic, the theory of finite automata, the theory of finite semi- 
groups, and first-order logic. The classification of discrete temporal properties 
has been accomplished to a great extent. A problem that is still open is the 
decidability of the combined until/since hierarchy, where a property is classified 
according to the nesting depth in U and S required to express it using future as 
well as past modalities. Note that this hierarchy is known to be strict, see E]. 

^ In the literature, other binary modalities (such as “at next” E] or “as long as” Q) 
have been occasionally used, and these operators are as powerful as “until.” In fact, 
nesting depth with regard to any of these two operators is exactly the same as nesting 
depth with respect to “until.” 
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Abstract. In this paper we extend the area of applications of the Ab- 
stract Harmonic Analysis to the field of Boolean function complexity. 
In particular, we extend the class of functions to which a spectral tech- 
nique developed in a series of works of the first author can be applied. 
This extension allows us to prove that testing square-free numbers by 
unbounded fan-in circuits of bounded depth requires a superpolynomial 
size. This implies the same estimate for the integer factorization problem. 



1 Introduction 



In recent years spectral techniques based on the Abstract Harmonic Analysis 
on the hypercube have been shown to represent a very useful tool for obtaining 
lower complexity bounds. Various links between Fourier coefficients of Boolean 
functions and their complexity characteristics have been studied in a number of 
works, see |2ldl41bliSI 1 41 1 9121 )l2,''il24j . In particular, these spectral techniques have 
been successfully applied to the parity function and to threshold functions. 

However, a limitation of such approach to the study of Boolean function com- 
plexity is that, besides the results for parity and threshold functions, spectral 
methods have provided lower bounds for specially constructed Boolean func- 
tions, which are not related to any particular number theoretic or combinatorial 
problem. In fact, there are very few known examples of functions coming from 
natural number theoretic or combinatorial problems for which the spectral tech- 
niques have produced non-trivial results. The only examples we are aware of 
are the lower bounds on integer multiplication m and on the complexity of 
computing the discrete logarithm There are also some very interesting 

results about determinants it™ . 

In this paper we pursue two purposes: 



o extend the area of applications of the spectral techniques to the study of 
Boolean function complexity; 

o obtain the first non-trivial lower bound on the circuit complexity of testing 
square-free numbers. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 47-^3 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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To this aim, we first provide a generalization of the spectral technique devel- 
oped in m for getting lower bounds on the size complexity of Boolean functions 
computed by constant-depth circuits. 

We then apply the generalized technique to evaluate the complexity of the 
Boolean function which decides whether a given (n-l-l)-bit odd integer is square- 
free, that is the function for which 



f{xi,. ..,Xn) 



1, if 2a; -I- 1 is square-free, 
0, if 2a; -I- 1 is square- full, 



( 1 ) 



where 2a; -I- 1 = a;i . . . a;„l is the bit representation of 2a; -I- 1 , 0 < a; < 2" — 1 (if 
necessary we add several leading zeros) . 

More precisely, we provide an estimate for the Fourier coefficients of m and 
derive a complexity lower bound showing that this function does not belong to 
the complexity class AC° . 

In |1 0125] . some lower bounds are obtained for the function deciding if a 
given integer a; is a quadratic residues modulo p. Here we show that some of 
the techniques used in frnEHj can be applied to the function (HJ. This approach 
is based on the uniformity of distribution of square-free numbers with some 
fixed binary digits. For the quadratic residuacity a similar property has been 
established by using the very powerful Weil estimate. Here we use a sieve method. 

Notice that our estimate compliments the results of izg on polynomial repre- 
sentations of the Boolean function deciding whether a given integer x is square- 
free. Moreover, it provides the first non-trivial lower bound on the circuit com- 
plexity of a number theoretic problem which is closely related to the integer 
factorization problem. 

Some results of this paper have recently been generalized in . Several more 
relevant results can also be found in m 



2 Basic Definitions 

Let = {0, 1}"" denote the n dimensional Boolean cube. 

We will use the notation |/| to denote the number of strings accepted by 
the function /, that is |/| = |{u> G | f{w) = 1}|. Moreover, pf denotes the 
probability that the function / takes the value 1 (over the uniform distribution), 
that is Pf = 1/1/2”. 

Given a binary string w G *8„, we denote with |w| the Hamming weight 
of w , which is the number of ones in w . 

An unbounded fan-in Boolean circuit C with input variables x\, . . . ,Xn, 
consists of several levels of AND, OR and NOT gates. The gates at the bot- 
tom level accept values from the input variables X\, . . . ,Xn - Each of the other 
gates may accept output values from any number of gates of the previous levels. 
The only top level gate contains the output C{x\, . . . , Xn) ■ For a more detailed 
description, see mm- 

The number of levels is called the depth of the circuit, the number of gates 
is called the size. 
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The class of AC° circuits consists of circuits whose size is bounded by a 
polynomial in n, and whose depth is bounded by a constant. 

A restriction p is a mapping of the set of the subscripts of input variables 
xi,...,Xn to the set {0, 1,*}, where 

o p{i) = 0 means that we substitute the value 0 for Xi ; 

o p(i) = 1 means that we substitute the value 1 for Xi ; 

o p(i) = * means that Xi remains a variable. 

Given a function / depending on n binary variables, we will denote by fp 
the function obtained from / by applying the restriction p; fp will be a function 
of the variables Xi for which p{xi) = *, 1 < z < n. 

The subscripts i and the corresponding variables Xi are called fixed if p(z) = 
0, 1, and free if p(z) = *. 

We recall that an integer x is called square-free if there is no prime p such 
that p^\x. Otherwise x is called square- full. 

Throughout the paper we denote by log x the binary logarithm of x . 

3 Abstract Harmonic Analysis and Circuits 

We give some background on abstract harmonic analysis on the hypercube. We 
refer to for a more detailed exposition. 

We consider the space T of all the two- valued functions on . The domain 

of IF is a locally compact Abelian group and the elements of its range, that is 0 
and 1, can be added and multiplied as complex numbers. The above properties 
allow one to analyze T by using tools from harmonic analysis. This means that 
it is possible to construct an orthogonal basis set of Fourier transform kernel 
functions for T . The kernel functions of the Fourier transform are defined in 
terms of a group homomorphism from to the direct product of n copies 
of the multiplicative subgroup {±1} on the unit circle of the complex plane. 
The functions Qw(x) = (— (— l)’" 2 a ;2 known 

as Fourier transform kernel functions, and the set {Q^ \ w G 18„} is an 
orthogonal basis for iF . 

We can now define the Abstract Fourier Transform of a Boolean function 
/ as the rational valued function f* which defines the coordinates of / with 
respect to the basis {Qw{x) \ w G 18„}, that is 

r(zc) = 2 - ^ QUx)f{x) = 2-^ Y. 

Then 

f{x)= Y QA^)f*H= Y 

is the Fourier expansion of / . 

It is interesting to note that the zero-order Fourier coefficient, that is the 
coefficient related to the all zeros string, is equal to the probability that the 
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function takes the value 1, while the other Fourier coefficients measure the cor- 
relation between the function and the parity of subsets of its input bits (see m 
for more details). 

As a consequence of the orthogonality of the functions , it is also possible 
to derive a very useful identity, the Parseval identity: 

^ ir{v)f = 2-- ^ f{v) = fS, (2) 



where /q denotes the zero-order Fourier coefficient. 

We finally present an interesting application of harmonic analysis to circuit 
complexity which is due to Ell- 

Lemma 1. Let f be a Boolean function on n variables computable by a Boolean 
circuit of depth d and size M, and let i? be any integer. Then 

i™i>^ 

where the sum is taken over all strings w G of cardinality |w| > -d. □ 

4 A Technique to Prove Lower Bounds on the Size/Depth 
of Circuits 

In P] and Pj a new technique has been developed with the aim of proving lower 
bounds on the size-complexity of Boolean functions presenting a rather strong 
combinatorial structure. This technique is based both on the abstract harmonic 
analysis on the hypercube, and on the spectral characterization of the size-depth 
trade-off of Boolean circuits which has been given in Lemmas 

Let / : — > {0,1} be a Boolean function depending on n variables. Now, 

let fc, 1 < A: < n, be the smallest integer such that / has the following property: 
for any subfunction fp depending on k variables, pf^ = Pf ^ where Pf^ = |/p|/2^ . 
In this case, we say that the function / is of level k (see [3| for more details). 

Then, if / is computable by a circuit of constant depth d and size M , it is 
possible to derive a lower bound on the size M of such a circuit, which depends 
both on the probability p / and on the level k : 



M> (p/-p"^)2W-fc)^'"+i. 

Notice that this result can be viewed as a generalization of the exponential 
lower bound for the size of constant depth circuits computing the parity func- 
tion |ni 1 4| . Indeed, parity and its complement are the only two non-constant 
Boolean functions of level 1, see j2|. 

The above lower bound can be proved by combining Lemma Q with some 
results of 0. 



Circuit Complexity of Testing Square-Free Numbers 



51 



The paper [2j also gives a complete characterization of functions of level k . 
A Boolean function / : ^ {0, 1} is of level k if and only if /q = Pf and 

f*{w) = 0 for any string w such that 0 < |w| < n — k. 

We now show how the above technique can be generalized in order to be 
applied also to functions which present such combinatorial structure only in an 
“approximate sense” . 

A Boolean function / : {0.1}" ^ (0, 1} is called i5-approximately of level 
k if 

b/p - P/I < ^ 

for any subfunction fp depending on at least k variables. 

In the following theorem we derive a spectral characterization of functions 
^-approximately of level k. 

Theorem 1. Let f : — > {0,1} be 5 -approximately of level k. Then, 

\ r { w )\<6 

for any string w such that 0 < |?u| < n — fc. 

Proof. Let p = {pi,p 2 , ■ • ■ , hn) be a Boolean string such that Q < \p\ = n — i < 
n — k. Moreover, let T = {i \ pi = 1} . 

For any string u S {0, 1}"“^, let fp^^^ denote the subfunction defined by the 
restriction pp^u that assigns to the variables Xi such that i & I, the {n — £) 
values taken from the string u, and leaves free the other i variables. Then, we 
have 



wG^n wG‘‘Bn 

= H (-»"■' T. T. (-!)'■' iv.l. 

vG<Bt 

For any u G ^n-i , the subfunction /p^ „ depends on £ > k variables and, since 
/ is (5-approximately of level k, we have 

||/pp,J- 27 /| <27. 



Thus, we get 



ir(M)i 



1 

2 " 



27/ E (-1)'“'+ E (-i)'“'(l/pp,J-27/) 

uG'^ri-e. 



1 

2"-^ 



E (-i)'“'(p/pp,„ -p/) 

uG^n-i 



uG‘^n-i 



and the result immediately follows. 



□ 
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We are now able to state and prove a theorem which provides a lower bound 
on the size required by a depth d circuit to compute functions which are 5- 
approximately of level k . 

Theorem 2. Let f : ^ {Oil} be a function 5 -approximately of level k. If 

f is computable by a circuit of constant depth d and size M , then 

M > + l {pf-pj - j22(n-fc)logn^ ^ 

Proof. An application of Lemma Q yields the following inequality: 

Let us choose d = n — k . Then, by using the Parseval identity 0 we obtain 

E (/*H)"= E E 

\w\yn-k l<\w\<n—k 

= Pf-Pf- E 

l<|u?|<n— fc 



where, as before, /q denotes the zero-order Fourier coefficient. 

We are now left with the evaluation of the sum of the squares of the Fourier 
coefficients of order less or equal to our threshold n — k. From Theorem 0 it 
follows that 

n—k n—k / \ 

l<|u;|<n— fc j — 1 ^ 



where we have applied the inequality 




Therefore, we obtain 

E >P/ 

|it;| >71— k 



and the result follows. □ 

Note that such a lower bound turns out to be meaningful provided that 

^2 2('*-'=)^°g" = o(p/). 
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5 Circuit Complexity of Testing Square-Free Numbers 



First of all we need a result about the uniformity of distribution of odd square- 
free numbers with some fixed binary digits. 

Let p he & restriction on the set {1, . . . , n} . We denote by Afp{n) the set 
of integers x, 0<x< 2’^ — 1 such that Xi = p{i) for all fixed subscripts 
i G {1, . . . ,n}, where . . .x„l is the bit representation of 2a; -I- 1. We also 

denote by Sp{n) the number of a; G Np{n) for which 2a: -I- 1 is square- free. 

Lemma 2. For any restriction p with r < jZ — 1 fixed positions, 

Sp{n) = 2”-’' -b 0(2"-’'-"/3 (’'+i)). 

7T^ 

Proof. Let Tp{n,m) be the number of a; G JVp(n) with mf\2x + 1. From the 
inclusion-exclusion principle it follows that 

^p(n)= ^ p{m)Tp{n,m), 

l<Tn<2(^ + l)/2 
m = l (mod 2) 

where p{m) is the Mobius function. We recall that ^*(1) = 1, pfra) = 0 if m is 
square-full and p{m) = (— otherwise, where v{jn) is the number of prime 
divisors of m > 2 . 

Let t be the length of the largest substring of free positions. It is obvious 
that the elements of Npfn) can be separated into groups such that in 

each group the numbers are of the form 2®z -bo, 0<z<2‘ — 1, for some 
integers s and a. 

For an odd integer to > 1, each such group contains 2*/mf -b 0(1) numbers 
which are congruent to zero modulo mf . Taking into account that t > n/(r-b 1) , 
we then obtain 



Tp{n,m) = 2"-7to 2 -b o(2"-’-00+i)). 

Put K = 2^"/3(''+i) . Applying the above asymptotic formula for m < K and 
the trivial bound Tp(n,m) < 2"/to^ -b 1 for to > K , we obtain 

/ \ 



l<m<K 
= 1 (mod 2) 



( / — \ 

^ + 0(2-— "/P+i))j-bO 



E ^ 



iC<Tn<2(’^ + l)/2 
m = l (mod 2) 



From Theorem 287 of we derive 
pfm) X ^ 



E 



E 



m 

i<m<K m=l (mod 2) 

m=l (mod 2) 



= E 



p{m) 



E 



2 +0(K-^) 
p{m) 



TTi—l m=0 (mod 2) 

3 p(jn) 3 



+ 0{K-^) 



= I H ^ = I + 0(x-‘) = ^ + 0(X-‘). 



4 ^ ' TO 

m—1 
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Therefore, 5p(n) = Stt ^2" ’' + 0(2" "/(’■+i)aT + 2"AT . Finally, since for 

r < n^/^/3 — 1 the first term in the ‘O ’-symbol dominates, the result follows. □ 

At this point we are able to derive our main result, namely a lower bound 
on the Boolean circuit complexity of testing square- free numbers. 

Theorem 3. Assume that the Boolean function f given by m is computed by 
an unbounded fan-in Boolean circuit C of depth d and of size M . Then, for 
sufficiently large n, 

d log log M > 0.5 log n + O(loglogn). 

Proof. Put k = n — [n^/^log~^nJ — 1. It follows from Lemma 0 that, for 
sufficiently large n, / is i5-approximately of level k with pf = 8 /tt^ and 
5 = exp(— log^ n) , where O > 0 is some absolute constant. Applying 
Theorem 0 we derive the desired statement. □ 

In particular, if the depth d is a constant, then the size turns out to be super- 
polynomial M > exp(cn''') , for some constants c > 0 and 7 > 0 . In particular, 
this means that testing square-free numbers, and thus integer factorization, can- 
not be done by a circuit of the class AC° . This result has recently been improved 
in 0 , where it is shown that for any prime p, testing square- free numbers as 
well as primality testing and testing co-primality of two given integers cannot be 
computed by AC°[p] circuits, that is, AC° circuits enhanced by MODp gates. 

Apparently the result of Lemma 0 can be improved by means of some more 
sophisticated sieve methods (see for instance m)- This may possibly improve 
the constant 0.5 in Theorem 0 



6 Concluding Remarks 



It would be very interesting to obtain analogous results for other Boolean func- 
tions related to number theoretic problems, for example for Boolean functions 
deciding primality or the parity of the number of prime divisors of the input x . 
Unfortunately, sieve techniques even more advanced than those used in Lemma 0 
are still not powerful enough to produce such results, even under the assumption 
of the Extended Riemann Hypothesis. 

We also remark that that some elementary number theoretic considerations 
have been used in IZS! to obtain a very tight lower bound on the sensitivity of 
the function which decides whether its input a; is a square- free integer. 

Recall that the sensitivity, cr(/), of a Boolean function / : ^ {Oil} 

(which is also known as the critical complexity) is defined as the largest integer 
s < n such that there is a binary vector x G iSn for which f(x) /(x^)) for 
s values of z, 1 < i < n, where a:*-’) is the vector obtained from x by flipping 
its zth coordinate. The average sensitivity is defined in a similar way as the 
average values of s taken over all x G . Thus 



n 

a{f) = max ^ |/(a;) - /(a^W) 



and (Javif) 



^ 12 E|/(^)-/(®^*^) • 

i=l 
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In |2ti| it has been shown that for the function g(x ) , deciding if an n-bit integer 
X is square-free, the bound a{g) > [n/60j holds. 

This sensitivity is of interest because it can be used to obtain lower bounds 
for the CREW PRAM complexity of a Boolean function / (see lllll2l22l2Vl h 
that is the complexity on a parallel random access machine with an unlimited 
number of all-powerful processors, such that simultaneous reads of a single mem- 
ory cell by several processors are permitted, but simultaneous writes are not. In 
particular, from the above bound on cr{g) one immediately concludes that the 
CREW PRAM complexity of g is at least 0.5 log n -I- 0(1), see 

It is also known that the average sensitivity can be expressed via the Fourier 
coefficients of / and related to the formula size complexity of / and to the degree 
of the polynomial approximation of / over the reals, see mm . Applying our 
results, one can derive the estimate Cavif) > cn^/^log“^n for the function / 
given by where c > 0 is an absolute constant. However, using a more direct 
approach, it is shown in [Z| that in fact 

4 

(Tavif) > -^n + oin). 

This bound implies several other results about various complexity characteristics 
of / , such as the formula size, the average decision tree depth and the degrees 
of exact and approximative polynomial representations of this function over the 
reals. It should be noted that for some of the above applications it is very essential 
to get better that the n^/^-lower bound for aavif) ■ 
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Abstract. Gircuit size, branching program size, and formula size of 
Boolean functions, denoted by C{f), BP{f), and L{f), are the most 
important complexity measures for Boolean functions. Often also the 
formula size L*{f) over the restricted basis {V,A,-'} is considered. It 
is well-known that C{f) < 3BP{f), BP(f) < L*{f), L*{f) < L{ff, 
and C{f) < L{f) — 1. These estimates are optimal. But the inequal- 
ity BP{f) < L{f)^ can be improved to BP{f) < 1.360 L(/)^, where 
/3 = log4(3-h^/5) < 1.195. 



1 Introduction 

Circuits, branching programs and formulas are the most important and well- 
studied computation models for Boolean functions / S i. e., /: {0,1}" — > 
{0, 1}. For circuits and formulas it is most natural to use the full binary basis 
B 2 , but for formulas also the restricted basis {V,A,-'} is of interest. For the 
sake of convenience we define these computation models and the corresponding 
complexity measures. 

Definition 1. 

(1) A circuit over X„ = jxi,... ,Xn} is defined as a sequence G\, . . . ,Gc of 
gates. A gate Gj is a triple {opp ij , I^) where opj S B 2 and lj,Ij G 
{0, 1, xi, . . . , Xn,Gi, . . . , Gj-i\. The inputs are also considered as functions. 
At gate Gj the operation op^ is applied to the functions represented at the 
inputs Ij and I^. The circuit complexity C{f) of f G Bn is the minimal 
number of gates to compute f. 

(2) A formula over X„ is a circuit where each gate may be used at most once 
as input of another gate (i. e., the underlying graph is a tree if the inputs 
may be duplicated). The formula size L(f) of f G Bn is the minimal number 
of inputs (or leaves) of a formula representing f. By L*{f) we denote the 
formula size for formulas where only V, A and ^ are allowed and negations 
are given for free. 

* The first and second author have been supported by DFG grant We 1066/8-1. 
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(3) A branching program (BP) over X„ consists of a directed acyclic graph G = 
(y, E) and a labelling of the nodes and edges of G. The graph has two sinks, 
labelled by 0 and 1, and the inner nodes (non-sink nodes) have outdegree 2 
and get labels from Xn- For each inner node one of the outgoing edges gets 
the label 0 and the other one the label 1. Each node v represents a Boolean 
function /„ G B„- For the of computation of fy(a), a G {0,1}", we follow a 
path in G starting at v and leading to one of the sinks. At nodes labelled by 
Xi the outgoing edge labelled by ai is chosen. Then /«(a) is equal to the label 
of the reached sink. The branching program complexity BP{f) of f G Bn is 
the minimal number of inner nodes to compute f. 

The consideration of these computation models does not need a further mo- 
tivation, since they are well-established. The following relations between the 
complexity measures are well-known (see, e.g., 0). 

Theorem 1. For all f G Bn, 

(1) C{f) < 3 BP{f), 

(2) BP{f) < L*{f), 

(3) L*{f) < L{ff, and 

(4) Gif) < L{f) - 1. 

These estimates are optimal. Krapchenko p] has proved that the parity of n 
variables has an L*-complexity of . From Theorem Q we obtain the esti- 
mate BP{f) < L{fY . The relationship between branching programs and for- 
mulas has been studied in another context. The famous result of Barrington 
states that formulas of depth O(logn), i.e., NC^-functions, can be represented 
by polynomial-size branching programs of width 5. Cleve 0 and Cai and Lip- 
ton P) have improved this simulation with respect to the size. They also have 
considered more general types of branching programs. Their simulations focussed 
on formulas with a given depth. With respect to formula size, these results have 
implications only if the formulas are well-balanced. The main result of this paper 
is the new estimate 

BP(/) < 1.360 L(/)^, where /? = log4(3 -k V5) < 1.195. (1) 

In Section 2, we present our simulation and the analysis is performed in Section 3. 
In Section 4, we prove that the analysis of our simulation is optimal and we 
discuss whether our simulation itself is optimal. 

2 Simulating Formulas by Branching Programs 

The task is to construct a branching program (of small size) for a given formula 
F. Let E = El® Ey, where ® is the operation at the root of the underlying tree of 
the formula and Ei, Ey are the two sub- formulas of E. For the ease of notation, 
we use the same name for a formula and its represented function. It is obvious 
that BP{E) = BP{E), since it is sufficient to negate the sinks. Therefore, it is 
sufficient to consider the two cases ® = A and 0 = 0. 
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Algorithm 1. 

We describe a recursive procedure for the construction of a branching program 
G{F) for the function represented by a formua F over A„ = {xi,... ,x„}. 
By size(i^) we denote the number of inner nodes of the constructed BP. The 
recursion stops for subformulas with one leaf which are replaced by BPs with 
one inner node, i. e., size(xi) = 1. Our construction for the two cases F = Fi A Fr 
and F = Fi (B Fr is shown in Fig. ^ 

Case 1: F = Fi A Fr- 

The BP combines the BP G{Fi) for F] and the BP G{Fr) for Fr. The 1-sink of 
G{F[) is replaced by the source of G(Fr). Obviously, 

size(F) = size(Fj) -|- size(Fr). (2) 



Case 2: F = Fi® Fr- 

Here we combine G{Fi) and a BP G{Fr,Fr) with two sources representing Fr 
and Fr- The 0-sink of G{Fi) is replaced by the F^-source of G{Fr,Fr) and the 
1-sink of G{Fi) is replaced by the F^-source of G{Fr,Fr). Using the evaluation 
rule of BPs it follows that the 1-sink is reached iff F/(a) = 0 and Fr{a) = 1 
or F/(a) = 1 and Fr{a) = 1 which means Fr{a) = 0. Hence, we realize F. We 
also may interchange the roles of Fi and Fr- Of course, we choose the better 
alternative. Hence, 

size(F) = min{size(F/) -|- size (F^, F’,.), size (F^) -I- size(F;, F;)}. (3) 




Case 1 




Fig. 1. BPs for F where F = Fi A Fr and F = F/ 0 F^. 



Now we are faced with a new problem. How do we obtain a BP for the pair 
(Fr,Fr) and in general for (F, F)? Again we consider the two cases F = F/ A 
Fr and F = F; 0 F^ which are illustrated in Fig. El As terminal case we get 
size(xi, Xi) = 2. 
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Fig. 2. BPs for {F, F) where F = Fi A Fj. and F = Fi (B Fr- 
Case 3: F = Fi A Fr and representation of (F, F). 

We use the BPs G{Fi), G{Fi), and G{Fr,Fr). The BP G{Fi) is obtained by 
copying G{Fi) and negating the sinks. As F-source we choose the source of 
G{Fi) and as F-source the source of G(F/). The 1-sink of G(F;) is replaced by 
the Ff-source of G{Fr,Fr) and the 0-sink of G(F/) is replaced by the F^-source 
of G{Fr, Fr). It is easy to verify that we represent (F, F). For the computation of 
F = Fi A Fr, we first evaluate F/. If F;(a) = 0, then F(a) = 0. If F/(a) = 1, then 
F(a) = Fr{a) and F is correctly evaluated. For the computation of F = FiV Fr, 
we first evaluate F/. If F;(a) = 0, then F(a) = Fr{a), and if F/(a) = 1, then 
F(a) = 1. Hence, also F is correctly evaluated. Again we may interchange the 
roles of Fi and Fr and choose the better alternative. We obtain 

size(F, F) = min{2 • size(F/) -|- size(F,., F,.), 2 • size(Fr) -I- size(F/, F/)}. (4) 

Case 4: F = Fi (B Fr and representation of (F, F) . 

We combine the BPs G{Fi,Fi) and G{Fr, Fr) as illustrated in Fig.|21 As F-source 
we choose the F;-source of G(F/,F/) and as F-source its F/-source. The 0-sink 
of G{Fi,Fi) is replaced by the F^-source of G{Fr,Fr) and the 1-sink of G{Fi,Fi) 
is replaced by the Fj.-source of G{Fr,Fr). By a simple case inspection as above 
it can be shown that we represent (F, F) and 

size(F, F) = size(F;,F/) -|- size(Fr, F^). (5) 

We have not obtained any new case and our construction is complete. □ 

Our main interest in this paper is to prove that the construction in Algorithm E 
leads to a branching program which is small with respect to the size of the given 
formula. Nevertheless, it is worth mentioning that the algorithm is efficient. The 
following result follows directly from the description of the algorithm. 
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Proposition 1. The construction of the branching program described in Algo- 
rithm^ can be performed in linear time with respect to the size of the resulting 
branching program. If we are only interested in the size of the resulting branching 
program, this can be computed in linear time with respect to the size of the given 
formula. 

3 Estimating the Size of the Constructed Branching 
Program 

In this section we estimate size(i^), the size of the branching program constructed 
by Algorithm Q] with respect to the size of the given formula F. By the descrip- 
tion of the algorithm it follows that we have to consider size{F) and size(C F) 
simultaneously. We do not know of a standard technique for the analysis. 

It has turned out to be reasonable to bound (p{size{F) ,size{F, F)) for some 
suitable function (p: ^ IR^.. We have chosen 

(p{s,t) := ma,yi(ais a 2 t,a'iS a' 2 t,a'(s alft), (6) 

Then we can prove the bound size(J^) < al^, where I is the number of leaves 
of F, a := 1) and (3 := log4(3 -I- ^/5) = log2(\/3 -I- -s/S). Having this 

result (which is proven later) in mind we see that we do not have to care about 
constant factors of <p. Moreover, the parameters in the definition of (p are chosen 
in order to minimize the upper bound. First let 

Pi := ^ (3 + = 2 ^-\ p2-.= (2 + V5) (3 + 75)”^^"= (2 + 75)2-^, 

91:=^ (3 + ^5) g2:=^(l + v^)- 

Then we define a\,a 2 , a'l, a' 2 , a'{, a'f as the unique solution of the the following 
system of linear equations. 

aiPl + 02P2 = 1, a'lPl + 02^2 = 1, 

a'lqi + 0292 = 1, a' 1'91 + 0292 = 1, 

O2 = 0, oi -t“ 2 o 2 ~ 2 ^(30]^ -t“ 402)- 

One can verify that 01,02, o'l, 02, o" , a'f are nonnegative. 

Lemma 1. Let s,t,s,t be nonnegative real numbers with 

s <t < 2s, s <i < 2s, p{s,t) < , and p{s,i) < . 

Then we have 

if{s -I- s, 2s -I- t) < {m + rh)^ ifs<sors = sAt<t, 

(f{s + i,t + i) < {m + fh)^ ifi<t or i = t A s < s. 



and 
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Proof. We only consider ip on the region of all (s,t) with s < t < 2 s. This 
region is shown in Fig. 0 One can verify that the set of all (s, t) with ip{s, t) = 1 
consists of three segments which meet in p = (pi,p2) and g = (91,92), as shown 
in Fig. 0 Therefore, we divide the considered region into three sectors by the 
lines t = (p2/pi)s and t = (92/91)5. We number the sectors from bottom to top 
by I, II, and III. Note that (p is linear within each sector. 





Fig. 3. Function p. 



Fig. 4. Triangle D. 



We start with the proof of the first inequality. From now on assume that 
s,s,t,t fulfill the condition for the first claim of the lemma. It is sufficient to 
show that is nonnegative, where p: > IR is defined by 

p{s,s,t,i) := - p{s + s, 2 s + i). ( 7 ) 

Denote by pi(s,t) the partial derivative of p{s,t) with respect to s and, ac- 
cordingly, denote by p2{s,t) the partial derivative with respect to t. p is mono- 
tonically increasing in s since it is continuous and, on every line in s-direction, 
dp/ds is defined on all but at most four points with 

1 + ( ^(g’/) ) j • V5i(s,t)-</5i(s-f s,2s-Rt) 

> </5l(s, t) — (/5i(s -I- 5, 2s -I- f) > 0. (8) 

The last inequality follows from the fact that, in the positive quadrant, moving in 
the direction of the vector (1, 2) never increases pi. The function pi is constant 
within each of the Sectors I, II, and III, with the smallest value in I and the 
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largest value in III. Since the lines bounding the sectors have a slope of at most 
2, the last inequality is correct. Thus, we can further restrict ourselves to the 
case oi s = s , t < t. Since /r is multiplicative, we can even assume s = s = 1. 
Now define jl on the triangle Z? C with the corners (1, 1), (2, 1), (2, 2) by 

- ip{2,2 + i). ( 9 ) 

It remains to show that fi is nonnegative on D. As shown in Fig. the four 
lines t = qilfhi t = P 2 IP 1 , t = 92/91, and t = P 2 IP 1 partition D into six 
regions, which are triangles and rectangles. Restricted to each of these regions, 
/t is a “linear transformation” of the function ip: ^ IR_|_ defined by ip{z, z) := 

(^z ^/0 -(- This means, there are linear functions li,l 2 ,h such that 

= h{t,i) + ip{l2{t)j3{i)). ( 10 ) 

Note that also (p{ 2,2 + i) is linear on the intervals [1,92/91], [92/91, P2/P1] and 
[P2 /pi, 2], since 2 + 92/91 = 2 p 2 /pi- Since ip is concave on R^ (the matrix of 
second order partial derivatives is negative semi-definite), /t is concave on each 
of the six regions of D. Thus the minimum value of /i on Z? appears on a corner 
of one of the six regions. 

It is p{q2/qi, 92/91) = 0. One can easily verify that (1 is positive on the other 
nine corners of the regions. Thus, p is nonnegative on D and we have proved the 
first inequality of the lemma. 

The second claim of the lemma is proved analogously. Assume that s, s, t, t 
fulfill the condition of the second claim. Setting 

v{s,s,t,i) := ~ p{s + i,t + t) (11) 

we need to show that v{s, s, t, t) > 0. Since 

— {s,s,t,t) = U + ( j ■ P 2 (s,t) - (P 2 (s + t,t + t) 

> <P2(s,t) — (f2(s + t,t + t) > 0, ( 12 ) 

n is monotonically increasing in t. Again, the last inequality follows from the 
fact that moving in direction of (1, 1) never increases </?2- Thus we can restrict 
ourselves to t = t = 1 and s < s. Define 12 on the triangle D' C R^ with the 
corners (0.5, 0.5), (0.5, 1), (1, 1) by 

P{s, s) := iz(s, s, 1, 1) = (^<p(s, 1)^/^ -h p(s, ~ + 1’ 2). (13) 

The four lines s = pi/p2, s = 91/92, s = pi/p2, and s = 91/92 divide D' 
into six regions. On each of those regions D is a linear transformation of ip, 
and thus concave (note that even ip{s + 1,2) is linear on every region since 
Pi/P 2 + 1 = 2(71/(72)- Evaluating P on the ten corners of the regions shows 
that P{pijp2,pilp2) = 0, (>(0.5, 0.5) = 0, and P > 0 on the other corners. This 
completes the proof of the second claim. □ 
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Lemma 2. Let F be a formula of size 1. Then we have 
V?(size(F),size(F, F)) < 
where [3 := log4(3 + ^/5) < 1.1943. 

Proof. We prove the lemma by induction on 1. For I = 1 the claim holds since 
size(F) < 1 and size(F, F) < 2 in this case. 

Let I > 1 and let F = Fi ^ Fr, where F/ and Fr are sub- formulas of size k 
and m, respectively. Set a := ip{l,2). By the induction hypothesis we have 

(/j(size(F/),size(Fj,F/)) < (14) 

if{size{Fr),size{Fr, Fr)) < (15) 

We distinguish two cases according to 
Case 1: (g) = A. 

W. 1. o. g., assume that size(F;) < size(F^) or size(F;) = size(Fr) A size(F^, Fr) < 
size(F/, F/). Then the Cases 1 and 3 in Section [^together with the first inequality 
of Lemma 0 yield 

</3(size(F), size(F, F)) < ip{size{Fi) + size(Fr),2 size(F/) + size(Fr, F^)) 

< = al^ . (16) 



Case 2: g = ©. 

W. 1. o.g., assume that size(Fr,Fj.) < size(F;,F;) or size(F^,Fr) = size(F;,F;) A 
size(F/) < size(F^). Then the Cases 2 and 4 in SectionO together with the second 
part of Lemma n yield 

V3(size(F),size(F, F)) < </j(size(F/) -|- size(Fr, F,.), size(F;, F;) -|- size(Fr, F,.)) 

< (a^^^k + = a . (17) 

□ 

Theorem 2. Let a = Lp(l,2) / ip{l,l) and (3 = log4(3 -I- V5). Then BP{f) < 
aL{f)P, a < 1.3592, and /? < 1.195. 

Proof. Let F be an optimal formula for /. By the preceding lemma it holds that 
(/j(size(F), size(F, F)) < (/j( 1, 2) F(/)^. Using the fact that ip{l,y) is monotoni- 
cally increasing in y and that, for s,t,c G R+, ip{cs,ct) = c(p{s,t), we have 

size(F) = (size (F), size (F, F)) ^p(l, size(F, F)/ size(F))“^ 

< <^(1, 2) L{f)0 ^(1, l)-i = aL{f)^. (18) 



The estimates for a and [3 follow by numerical calculations. 



□ 
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4 On the Optimality of the Simulation 

The first question is whether the analysis of our simulation is optimal. This is 
indeed the case, as the following example shows. 

Definition 2. The function Alternating Tree, ATfc, is defined on n = 2 ^ vari- 
ables, for an odd number k. Let ATi(xi,a:2) := xi 0 X2 and 

ATk{u,v,x,y) := {ATk-2{u) A ATfe_2(u)) © (ATfc_2(a:) A ATk-2{y)) 

for disjoint variable vectors u, v, x, and y of length 2^“^. 

Theorem 3. The formula size of AT k equals n = 2 ^ . The size of the branching 
program constructed by Algorithm^ for the optimal formula obtained from the 
definition of AT k equals 

where 

Cl = ^ (15 0 T'/b) , C2 = (15 — 7-\/5), r = 3 © y/b, and s = 3 — '/b. 

This bound is at least 1.340 — o(l). 

Proof. The result on the formula size is obvious. Because of the symmetry in the 
definition of AT^ we are able to analyse the size of the resulting BP. Let Sk be 
the BP size for ATfc and Tk be the size for (ATfc, AT^). It is easy to check by case 
inspection that = 3 and Ti = 4. The definition of AT^ can be abbreviated by 
F = (Fi A F2) © (T3 A F4), where Fi, F2, F3, and F4 are the same formulas on 

different variable sets. Therefore, Fi = F^ in the Cases 2 and 3 in Algorithm ^ 

and the two terms from which we have to take the minimal one are equal. Hence, 
by the case inspection in Algorithm ^ 

size(0) = size(©i A F2) + size ((F3 A T4), (F3 A F4)) 

= size(T’i) + size(F2) + 2 • size(F3) + size(p4, F4), and 

size(F, F) = size ((T’l A F2), (Fi A F2)) + size ((F3 A F4), (F3 A F4)) 

= 2 • size(Fi) + size(F2, F2) + 2 • size(F3) + size(F4, F4). (19) 

Since Fi = F2 = F3 = F4 = ATfc_2 for F = ATfc, this leads to 

Sk = 4:Sk-2 + Tk-2, and Tfc = 4S'fc_2 + 2Tfc_2. (20) 

The exact solution of these linear difference equations follows by standard tech- 
niques (see |SI). □ 

This result proves that our analysis is optimal, but it says nothing about the 
optimality of the simulation. For this problem it would be interesting to know the 
branching program complexity of the alternating tree function. In this situation 
we have to complain about the lack of powerful lower bound techniques for the 
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branching program complexity of explicitly defined Boolean functions. The most 
powerful technique due to Neciporuk |2j always gives even larger bounds on the 
formula size and is useless for our purposes. Otherwise there are only bounds of 
quasilinear size like the bounds of Babai, Pudlak, Rodl, and Szemeredi Q. They 
consider symmetric Boolean functions and for these functions the best known 
upper bounds on the formula size are much larger than their lower bounds on 
the branching program size. We finish the paper with the observation that the 
largest trade-off between branching program and formula size can already be 
proved if we restrict ourselves to read-once formulas. 

Definition 3. A read-once formula is a formula whose leaves are labelled by 
different variables. 



Theorem 4. Let L(l) be the class of Boolean functions whose formula size 
equals I and let BP(l) be the class of all functions with the maximal branch- 
ing program complexity of all functions in L{1). Then BP{1) also contains a 
function f G L{1) representable by a read-once formula of size 1. 

Proof. Let / G BP{1). Then we define /* by replacing the leaves of an optimal 
formula F for / by x\,... ,xi. Obviously, f* G L{1). It is sufficient to prove 
BP if*) > BP{f). This is also easy. Let G be an optimal branching program for 
/*. Then we may reverse the replacement of the leaves of T’ by a;i, . . . ,xi and 
obtain a branching program for / of size BP{f*). □ 

Hence, we have reduced our problem to the following one. 

Problem 1. Determine the maximal branching program complexity of functions 
representable by read-once formulas of size 1. 

Because of Theorem 0 also the following problem is of interest. 

Problem 2. Determine the branching program complexity of the alternating tree 
function. 
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Abstract. We address the problem of organizing a set T of shared data 
into the memory modules of a Distributed Memory Machine (DMM) in 
order to minimize memory access conflicts during read operations. 

In this paper we present a new randomized scheme that, with high prob- 
ability, performs any set of r unrelated read operations on the shared 
data set T in 0(logr -|- loglog|T|) parallel time with no memory con- 
flicts and using 0(r) processors. The set T is distributed into m DMM 
memory modules where m is polynomial in r and logarithmic in T, and 
the overall size of the shared memory used by our scheme is not larger 
than (1 -b 1/log |T|)|T| (this means that there is “almost” no data repli- 
cation). The memory organization scheme and most part of all the com- 
putations of our method do not depend on the read requests, so they can 
be performed once and for all during an off-line phase. This is a relevant 
improvement over the previous deterministic method recently given in [p 
when “real-time” applications are considered. 



1 Introduction 



Consider a shared-memory synchronous parallel machine in which a set of p pro- 
cessors can access to a set of b memory modules (also called banks) in parallel, 
provided that a memory module is not accessed by more than one processor 
simultaneously. The processors are connected to the memory modules through 
a switching network. This parallel model, commonly referred to as Distributed 
Memory Machine (DMM) or Module Parallel Computer, is considered more re- 
alistic than the PRAM model and it has been the subject of several studies in 
the literature [Mfll 1 41 1 hM'Zi )] . In an EREW PRAM, each of the p processors can in 
fact access any of the N memory words, provided that a word is not accessed by 
more than one processor simultaneously. To ensure such connectivity, the total 
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number of the switching elements must be 0(pN). For large shared memory, 
constructing a switching network of this complexity is very expensive or even 
impossible in practice. 

One standard way to reduce the complexity of the switching network is to 
organize the shared memory in modules in]. each of them containing several 
words. A processor switches between modules and not between individual words. 
So, if the total number of modules is b << N we then need only 0{pb) switching 
elements to realize the interconnection network. 

There are two fundamental problems that always arise when the DMM model 
is adopted. The first one concerns the construction of feasible switching networks 
and related routing algorithms that must guarantee a full connectivity between 
processors and memory modules with the best achievable delay. Several random- 
ized and deterministic solutions of this problem have been derived over the last 
years (see for a good survey). 

Once the routing problem is efficiently solved, the shared data have to be 
distributed (and, if necessary, replicated) among the set of memory modules 
so that processors can access them avoiding simultaneous reading accesses on 
the same memory module. This second problem is sometimes referred to as the 
granularity problem. The importance of this problem lies in the fact that reading 
conflicts on the same shared-memory module (and, in general, any operation that 
generates conflicts in the use of shared external devices) can only be solved at 
the cost of a significant time delay. So, the memory contention, i.e. the maximum 
number of shared accesses simultaneously mapped into the same module, is one 
of the most important factors of the overall time complexity of a DMM algorithm. 

In this paper, we address the granularity problem by assuming that we have 
at hand a sufficiently good solution for the routing problem and thus processors 
and memory modules can be thought as being ideally connected by a complete 
bipartite graph. We also assume that our DMM model is provided by memory 
interleaving technology CH that allows any processor to send access requests 
to more than one memory module simultaneously, provided that each of these 
modules is not used by other processors. 

Based on the above assumptions, several works have been devoted to the de- 
sign of efficient solutions for the granularity problem. In particular, randomized 
solutions have been presented in PE ED], while less efficient but determinis- 
tic solutions have been introduced in [ 1 612 1 ) . Most of these works concern the 
problem of simulating a PRAM algorithm on a DMM model. Further relevant 
applications of the granularity problem concern the design of parallel systems 
for Private Information Retrieval (PIR) on public-accessible databases P0S|, 
parallel routing-table computations for IP lookup P3EI- 

Let T be the table of binary data to be shared and r be the number of parallel 
read accesses to be satisfied. The performance analysis of any solution of the 
granularity problem on the DMM model should mainly address the following 
aspects: 

1) The total size of the shared memory required to implement the original 
table T; 2) The memory contention, i.e., the maximum number of simultaneous 
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access requests that a single memory module must satisfy; 3) The parallel time 
complexity. This time complexity depends on both the local computations per- 
formed by each processor and on the memory contention. As discussed before, 
it is reasonable to assume that the latter represents a dominant factor; 4) The 
number of processors. 

Clearly, the role of the first aspect becomes crucial when large data tables 
have to be shared (this is the case, for instance, of public databases accessible, 
say, on WWW servers). On the other hand, in the case of PRAM simulations, 
it is reasonable to assume that the number of shared data is relatively small. 

To our knowledge, the best randomized solution for the granularity problem 
in the case of PRAM simulation on the DMM model has been introduced by 
Karp et al in jHj. They indeed derived a randomized method that simulates a 
PRAM with p log log p log* p processors by using a DMM with p processors with 
optimal expected delay time O (log log p log* p) per step of simulation. The mem- 
ory contention is O(loglogp). Each of the shared data is replicated in O(loglogp) 
copies and mapped to p memory modules by means of 0(log logp) hash functions 
(see also jOj). The randomized simulation of the PRAM algorithm consists of a 
sequence of consecutive phases to be executed on the DMM model. Each phase 
simulates 0(p^^^°) steps of the PRAM algorithm and all the variables (observe 
that this number is at most 0(p^^^^°)) used during these steps are distributed 
into the memory modules by using new randomly selected hash functions. This 
random distribution is called the cleaning up task. Observe also that this solu- 
tion requires the a priori knowledge of the data used during the generic phase 
and which must be assigned to the memory modules. 

The above solution can be considered efficient for the problem of simulating 
PRAM algorithms where, as already remarked, the number of sharing data is 
relatively small and it is possible to define the set of data actually used by the 
program or by a part of it On the other hand, this randomized solution 
turns out to be less efficient when: i) the number of shared data is significantly 
larger than the number of available processors, ii) it is not possible to determine 
which is the set of shared data requests that will be performed in the next phase. 

These are the typical situations that arise in the case of concurrent accesses 
to large data structures such as public database available on WWW servers and 
IP routers databases where “on-line” read requests have to be satisfied in “real 
time” . We emphasize that an application of Karp et al's method for this version 
of the granularity problem would imply a new assignment of the sharing data 
to the memory modules (during the cleaning up procedure) for each new set of 
read requests. 

One solution for the problem of performing read accesses on a large database 
using the DMM model has been recently given by Andreev et al |Q. The algo- 
rithm performs r arbitrary read requests on a database T of size N = 2^ within 
the following performance^ 1) The total size of the shared memory required to 
implement the original table T is 2"(1 -|- — ) to represent the input function and 
0{r^ log(r^n)) to perform extra algebraic computations; 2) There is no memory 



^ We here use a definition of processor which is different from that used in [Q. 
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contention, i.e., each memory module receives at most one read request during 
the algorithm; 3) The worst-case parallel time is logarithmic in r and n; 4) The 
number of processors is 0(r^n). 

Andreev et al’s method have interesting applications for the direct-sum prob- 
lem in circuit complexity [Q. On the other hand, one negative aspect lies in a 
rather expensive setup procedure INIT whose goal is to select the correct ad- 
dresses of the memory modules that will be considered during the algorithm^. 
This procedure is a sequence of non trivial search and comparing operations in- 
side a matrix M of O(r^) elements from the finite field GF{q) (where q = 0{r^n)) 
that allows to select one row of M that satisfies a certain algebraic property. More 
importantly, INIT depends on the particular values of the input requests, so it 
must be run for every new set of r requests. This setup procedure performed at 
“run time” yields an overhead cost that makes the overall algorithm not useful 
for real time applications. 



1.1 Our Result 

We provide a randomized version of Andreev et al's algorithm that solves the 
above problem and requires neither the execution of INIT nor to implement the 
relative matrix M . The processor programs are thus simpler and more suitable 
for real time applications. 

Given any error probability 0 < i5 < 1/2, our new randomized version per- 
forms with probability at least (1 — i5) any set of r read requests on T of N = 2" 
bits within the following complexity. 1) 2"(1 -|- ^) memory size (no additional 
shared memory for extra-computation); 2) there is no memory contention; 3) 
0(logr -I- logn -I- log(l/(5)) parallel time; 4) 0{r{{l/6)r^n)) processors. (As we 
will see later, the (r^n) factor refers to the amount of the simple xor gates 
of fan-in 2 that are required to parallelize the task of one read request); 5) 
0(log r -I- log 5) random bits. 

Observe that in case of error, the algorithm does not fail to compute the 
function but it just might have memory contention greater than 0. 

The advantage of our randomized solution is not only in the above perfor- 
mances. In fact, the distribution of T into the memory modules does not depend 
on the set of the r read requests, so it can be done off-line in a pre-processing 
phase. The use of randomness in our algorithm is required only to select a set 
of O(r^) elements from a finite field with uniform distribution. Furthermore, the 
computation of the memory module addresses which have to be used to satisfy 
the r read requests is simple and involves only elementary linear algebra on finite 
fields (more precisely, it is required to compute the set of points that belong to 
a line specified by one of its points and the parameter of its direction). Finally 
the number of memory modules in which the table is organized is polynomial in 
r and logarithmic in the size of T . As discussed before, the fact that both the 
number of processors and the number of memory modules are logarithmic in the 



^ A detailed description of the main ideas of this algorithm is given in Section |3 
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number of sharing data makes our solution more efficient in the case of large 
database applications than those proposed for PRAM simulations. A relevant 
example of such applications is the parallel implementation of Private Infor- 
mation Retrieval (PIR) systems on the DMM model. Due to the lack of 

space, this application will be described in the full version of this paper. 



2 Description of the Algorithm 

Let T be the binary database to be shared. Let |T| = 2" for some integer n > 0, 
we then consider T as the output table of a finite Boolean function / : {0, 1}” — > 
{0,1}. According to this terminology, our problem is that of computing the 
function / on a set of r unrelated inputs. As stated in the Introduction, we will 
adopt the DMM parallel model. The time required by any processor to perform a 
query to a shared memory module is denoted as extime. In our case, each shared 
memory module contains one Boolean subfunction (which is stored by means 
of its output table): processors can specify the input of one of these Boolean 
functions and get one output bit. 

Finally, in order to run randomized algorithms, we assume that a public 
pseudo-random generator of bits is available to all processors. 

Let Xi,. . . ,Xr be a set of inputs for function /. Our first step consists of 
splitting the input space {0, 1}" in the direct product of two subspaces: 

{0,1}” = {0,1}'^'" X {0,1}””'^'' 

(the correct choice of k will be given later) . The first subspace is here considered 
as the finite field GF{q)'^ where g = 2^. It follows that / can be written as 
f{A, B), where A G GF{q)^, B G {0, 1}”~^^, and our problem is now to compute 
the set of values f{Ai,Bi), /(A 2 , B 2 ), ■ ■ ■ , f{Ar, Br) for arbitrary pairs {Ai, Bi), 
i = l,...,r. 

We need here some algebraic definitions. Consider the set GF{q)'^ as a 4- 
dimensional linear space. Let l{A,u) be the line passing through A and parallel 
to the vector h{u) = (1, u, u^). Notice that the parameter u determines the 
direction of the line. Let U be any subset of GF{q) (the correct choice of this 
subset is crucial for our randomized algorithm, and it will be given in Section EJ; 
the term SLi{U) denotes the set of all lines l{A,u) such that A G GF{q)'^ 
and u G U. We also define the set of points k^{A,u) = l{A,u) \ {A}. For 
any A G GF{q)'^, consider the function Ja '■ {0, 1 }"“^'' ^ {Q, 1} defined as 
Ja{B) = f{A,B). Furthermore, for each line I G SL 4 (U), define the function 
5 , : {0,1}”-"'= ^{0,1} as 

= 0 Ia 
Aei 

These functions give our representation of / in the shared memory, i.e., each 
of them is stored in one single memory module of the DMM. Notice that the 
construction (more precisely, the size and the structure of the function tables) 
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is independent from / and from the sequence {Ai, Bi), i = 1, . . . , r. So, it can be 
performed in a preliminary phase once and for all. 

A processor can call one of such functions by paying a special time cost de- 
noted as extime. In what follows, we define a system of pairwise independent 
“computations” of /. 

For any A £ Z, it is easy to prove that 



UB) = 0 fA^{B) 

\a*GZ\{A} 

Informally speaking, the idea of our solution is that we can compute / on a given 
input (Ai,Bi) without using the memory module that contains 

If we consider a set {iti, . . . , of elements from GF{q) then we can compute 
/ on {Ai, Bi), i = 1, . . . , r, by applying the following simple procedure: 

— Procedure ALG\. 

— input: / (stored in the shared memory modules by means of functions gi and /a); 
(Ai, Bi), i = 1, . . . , r; 

{mi, . . . , Ur} {ui e GF{q)) i = 1,... ,r- 

— begin 

— for any i = 1, . . . , r do 

— • begin 

• read 3((A,.u,)(Bi); 

• for any A* G l*{Ai,Ui) read /A*(Si); 

• end 

— for any i = 1, . . . , r compute 




f{Ai,Bi) — gi(Ai,ui){Bi) 01 0 fA<B,) 

i A* (Aj ,Uj^) 



( 1 ) 



— end. 



The system in Eq. Q suggests us a way to avoid memory contention in ALGi : it 
suffices to find a set of elements {ui, . . . , Ur} such that any function of type /a 
or gi (and so any memory module) can participate only in one equation of the 
system. 

In the deterministic algorithm presented by Andreev et al in [Q, this task 
is solved by means of a rather expensive deterministic procedure I NIT that 
considers a suitable matrix M{i,j) of x r distinct elements from GF{q) and 
then computes the first row {ui, . . . , Ur} of M for which the following property 
holds 



forany ^ l*{Aj^,Uj^)[^l*{Aj^,Uj^) = 0. (2) 

This task (in particular, that of checking the above property) is expensive in 
terms of number of processors, parallel time, and requires non trivial algebraic 
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operations and comparison in GF{q). Furthermore, we emphasize that the out- 
put of the procedure INIT in fact depends on Ai {i = 1, . . . ,r) hence it must be 
performed for every possible values of such prefixes. 

In the next section, we will give an algebraic lemma that allows us to avoid the 
procedure INIT by using a suitable random choice of the sequence {iti, . . . , Ur}. 

3 Avoiding Memory Contention via Randomness 

The randomized algorithm that, on any input sequence Ai, . . . , Ar, returns an 
output sequence u\, . . . ,Ur satisfying Property El enjoys of the following result. 

Lemma 1. Let c > 1 and let U be any subset ofGF(q) sueh that \U\ > cr^ . Let 
M = z=l,...,cr^; j = l,...,r} 

be a matrix of pairwise distinet elements from U . The probability that a randomly 
chosen index ig satisfies the property 

for any ji j2 , I* (Aj , , ) = 0 

is at least 1 — 7^. 

Proof. Let us define the subset 

BAD = {i & {l,...,cr^} | 3ji{i) jaW, , Miji(i)) n yf 0}. 

Assume that for some Lf C GF{q) with \U\ > cr^ and for some matrix M of cr^ 
distinct elements of Lf we have that 



\BAD\ > 



cr r 

For any i S BAD, consider two indexes ji{i) yf J2(*) for which 



(3) 



(observe that, from the definition of BAD, these two indexes must exist). 

From the condition of the lemma y^ Ui,j2(d Eq. 2] we easily have 

that y^ Since the number of possible pairs with 

^ ^j2{i)) is 

ir(r- 1) < < \BAD\ 

then at least two different i± and i2 exist for which and ^^2(21) = 

^1 ’^ji{ii) '^^1(^2) 7 "^2 '^^2(0) ■^^2(^2)? ^nd also define 

Cl /(7I2, 7 C2 ^i 2 ,ji(^ 2 )) 7 ^i 2 ,i 2 (^ 2 )) ■ 
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Consider now the four vectors 

V\ = C\ — A\ , V2 = A2 — C\ , V3 = C2 — ^2 , Vi = A\ — C2 ■ 

It is easy to verify that they are linearly dependent, i.e. + fyj + V4 = 0 . 

Furthermore, we have that 

Vi II h(ui), V2 II h(u2), C3 II h(u3), V4 II h(u4), where 

At least two of the above vectors Vi, C2, V3, V4 are not zero. It follows that 
vectors h{ui),i = should be linearly dependent. But this is not true: 

these vectors constitute the well known Wandermonde determinant which is 
always positive for pairwise distinct values of i = 1 , . . . , 4 . This implies that 
\BAD\ < ^ and the lemma is proved. 

□ 

Informally speaking, this lemma states that if we randomly choose a row of 
M then, with high probability, the r elements contained in this row can be used 
to compute the system in Eq. Q] avoiding reading conflicts on memory modules. 

4 The Global DMM Algorithm and Its Performance 
Analysis 

In what follows, we give an overall description of all the tasks performed by the 
global algorithm denoted as ALG. ALG receives as input an integer parameter 
c > 1 , two positive integers n and r (1 < r < 2 ”), the output table T of a Boolean 
function / : { 0 , 1 }" ^ { 0 , 1 }, and a set of r inputs {Xi = (Aj, Bi), i = 1 , . . . ,r}. 

1 . The Pre-Processing Task: The Shared Memory Partition. Let 

k = [log c + 3 log r + log n\ , and q = 2^ 

Consider the held GF{q) using its standard binary representation. Define U 
as the first cr^ elements in GF{q) (any ordering of the held works well). Then, 
we store the subtables /a (for any A S GF{q)'^) and gi (for any I G SL4{U) 
in the shared memory modules. Note that this memory structure depends 
only on n and r, so if some values in the Table T of / will change (and/or 
some input A), we just need to update some of the subtables but we do not 
need to update the memory organization scheme. 

2 . The Randomized Procedure RAND. Consider the (cr^ x r)-matrix M 
where 

M{i,j) is the ((i — l)cr^ +/) -th element of U 

(note that we don’t need to store M in the shared memory). 

Choose uniformly at random an index from the set 1 , . . . , cr^ and 

for any j = 1 , . . . , r, return Uj = M{in,j) ■ 

3 The procedure ALGi. Apply procedure ALG\ using (ui, . . . ,Ur). 
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We now analyse the costs of ALG by assuming that the pre-processing task 
has been already done (as already observed, this task depends only on r and n). 
However, we remark that even this task can be efficiently parallelized since the 
number of subtables is |GA(g)‘^| -|- \SL 4 {U)\ which is polynomial in n and r. 

Since we have defined k = [log c -I- 3 log r -|- log n] and 9 = 2^, it follows that 



\SL4{U)\ = 



= p4feM < o4fei 
\GF{q)\ - n ■ 



The total size of the shared memory used to implement the / is the thus the 
following 



mem(ALG) = \GF{q)^\2^-*'^ + \SL4{U)\2^-^'^ < 

Assume that we have r processors {pi, . . . ,Pr}- 

- In the procedure RAND we select an element ir G by making 

[log r -I- log c] calls of the public pseudo-random generator. Then pj (j = 1, ... ,r) 
returns the element Uj by computing the ((i — l)r-|-j)-th element of U as specified 
by the procedure RAND. This computation in GF{q) requires 0(k) time using 
0{k‘^) processors. 

- The third phase requires the computation of procedure ALG\. For any i = 
1, . . . , r. Pi computes the function in Eq. ^ 

Assume now that the sequence u\, ... ,Ur verifies the property in Eq. El (from 
LemmaH this happens with probability greater than (1 — l/(2c))). In this case, 
each shared memory module receives at most one query, so the memory con- 
tention is 0. It follows that the task of each processor pj can be performed in 
0(logr -I- logc -I- logn) -|- extime parallel time using a number of Boolean gates 
of fan-in two that satisfies the bound 0 (|Z^(A, u)|) = 0(2^) = 0{cr^n) 

Finally, we have proved the following theorem. 

Theorem 2. Given any c > 1, the algorithm ALG computes with prohahility at 
least (1 — l/(2c)) any n-input Boolean function f on a set of r inputs, within 
the following complexity 

— (1-1- i) 2 ” memory size (no additional shared memory for extra- computation); 

— 0 (logr -I- logn -I- logc) -|- extime parallel time (with no memory contention); 

— r processors each of them having 0{cr^n) Boolean gates of fan-in 2; 

— 0 (log r -I- log c) random bits. 
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Abstract. We analyze the average time complexity of evaluating all 
prefixes of an input vector over a given algebraic structure (A',®). As 
a computational model networks of finite controls are used and a com- 
plexity measure for the average delay of such networks is introduced. 
Based on this notion, we then define the average case complexity of a 
computational problem for arbitrary strictly positive input distributions. 
We give a complete characterization of the average complexity of prefix 
functions with respect to the underlying algebraic structure {E, ®) resp. 
the corresponding Moore-machine M. By considering a related reacha- 
bility problem for finite automata it is shown that the complexity only 
depends on two properties of M, called confluence and dijfluence. We 
prove optimal lower bounds for the average case complexity. Further- 
more, a network design is presented that achieves the optimal delay for 
all prefix functions and all inputs of a given length while keeping the 
network size linear. It differs substantially from the known constructions 
for the worst case. 



1 Introduction 

The parallel prefix problem is a fundamental task with a lot of applications such 
as addition of binary numbers or solving linear recurrences. To each such prob- 
lem one associates a specific semigroup that describes the algebraic structure of 
the problem. The complexity of the parallel prefix problem has extensively been 
investigated in the circuit model. The analysis of the expected time of computing 
the parallel prefix function goes back to the analysis of Burks, Goldstine, and von 
Neumann 1946. In |S| it is pointed out that the speed of the arithmetic-logic unit 
of a processor mainly depends on the speed of the adder used - which can sig- 
nificantly be accelerated in the average. The authors have explored the expected 
length of a carry chain and shown a logarithmic upper and lower bound for these 
chains. The results of Burks et. al. have been improved by different authors (see 
for example !ilj|4l i( li!). Based on these results several models for computing 
the sum of two n-bit binary numbers efficiently in the average case have been 
analyzed in the literature. Expected case upper bounds of order O(loglogn) for 
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the addition and other prefix problems have also been obtained in m and m- 
Reif has observed that circuit depth 0(log log n) is sufficient if one allows a small 
portion of input vectors for which a wrong result may be obtained. He has also 
introduced a circuit that supervises these errors and corrects them - but this 
requires one gate of unbounded fanin (see also 03 ). In contrast to Reif’s circuit 
in m we have presented a standard circuit that correctly computes the addition 
within an asymptotic minimum time for all inputs. 

Every prefix function for which the underlying algebraic structure is a semi- 
group can obviously be computed in logarithmic depth. Ladner and Fischer have 
shown how this can be done in parallel using only linear circuit size im . Further, 
they have shown that each function that can be expressed by a Moore-machine 
can be transformed into a semigroup (if,®) where |I7| is exponential in the size 
of the Moore-machine. Snir has obtained exact bounds for the tradeoff between 
the depth and the size of prefix circuits H3 Bilardi and Preparata have studied 
time-size tradeoffs for this problem. They have given a complete characterization 
of the semigroups with respect to this question 0 by providing tight lower and 
upper bounds. The complexity depends on algebraic properties of the semigroup. 
It is shown that only two cases are possible: within a wide range either the prod- 
uct of time and size grows only linearly or as nlogn. In 0 the same authors 
have studied constant depth unbounded-fanin circuits and classify semigroups 
according to the property of having linear size prefix circuits. 

In 0 we have defined an average measure for the delay of Boolean circuits 
called time. The idea is to take advantage of favorable cases in which the value 
of a prefix can be computed faster than within the trivial lower bound of log- 
arithmic depth. We have shown that for several semigroups like the Boolean 
semigroup with the OR or the AND-operator or for the semigroup corresponding 
to the addition of binary numbers one can compute all prefixes with an average 
delay of order log log n |9lll)j . Furthermore, the circuit size can be kept linear. 
That means there is an exponential speedup from the worst case to the average 
case. On the other hand, for functions like PARITY or MAJORITY no speedup is 
possible. In m we have also shown that there is a polynomial time decidable 
algebraic property, called confluence, that decides whether an average delay of 
order log log n can be achieved or not. 

To generalize these results to arbitrary parallel machines with restricted de- 
gree like EREW-PRAMs, CREW-PRAMs, or networks of RAMs we will extend 
the circuit model to processor networks where each computational unit consists 
of a finite control. In contrast to the circuit model the nodes of this networks 
can compute interim results which simplify the analysis of the lower bounds and 
extensions to more general models. On the other hand the results can be applied 
to the circuit model by using a statistical estimation (see 0). 

Based on HD! the relation between the average case complexity of the com- 
putation of parallel prefix problems in a processor network and the structure of 
the underlying groupoid will be studied in more detail. Using the notion of con- 
fluence and diffluence a complete characterization will be given saying that 
only three substantially different cases are possible. Either the average delay is 
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constant, or is of order log log n, or it is of logarithmic order, that means equal 
to the worst case. To obtain these results we translate the prefix functions into 
a reachability problem. It will be shown that the reachable sets depend on al- 
gebraic properties of the underlying Moore-machine. For further investigations 
see 0. 

2 Definitions 

2.1 The Prefix Functions 

Define w ic[l]i(j[2] . . .G S* and for i < j G N w[i-,j]:= w[z] . . . w[j]. Fur- 
ther let A be the empty word. A prefix function can be defined by the transition 
function of a Moore-machine. Thus, we define: 

Definition 1 A Moore- machine M = (Q, Ai, Aq, (?o, 7 ) is a 6 -tuple where 

Q denotes a set of states, Si the input alphabet, Sq the output alphabet, qo G Q 
the starting state, S : Q x Si Q the transition function and 7 : Q ^ Sq the 
output function of M . To extend the transition function to strings w G Sf we 
define for q G Q and w G Sf: S(q,w) = q if w = X and i5(i5(g, w[l]), w[2; |w|]) 
elsewhere. Further, define the transformation function 

Tm,q := 7(g)7(<5(g,w[l]))7((5(9,w[l;2])) . . .7((5(g,w)) . 

For easier notion we assume that 7 ( 90 ) = The transition function of M is 
given by Tm{w):= TMqo(w). Furthermore define S{Q',w) := {S{q,w)\q G Q'}, 
d(q,W) := {S{q,w)\w G IF}, and S(Q',W) := <5(Q', H for Q' C Q 

and IF C Sf . For a Moore-machine M = {Q, Si, So,qo,S,^) and a state q G Q 
define Mg := {Q, Si, Sq, q,S,j). 

A function f : Sf —>■ Sq is called a prefix function if there exists Moore- 
machine M with Tm = f ■ The set of all prefix function is denoted by .^prefix- 

Analogously, .Apreflx can be defined using Mealy-machines or deterministic gen- 
eral sequential machine, respectively. On the other hand we can characterize 
prefix functions by an algebraic structure {S, ®), called groupoid, and an homo- 
morphism h. Note that in contrast to a semigroup or a group the binary operator 
0 of a groupoid is not necessarily associative. 

Definition 2 Let S,Si,Sq be finite alphabets with Si C S. For a groupoid 
(A,0) (i.e. an algebraic structure with a binary operation ® \ S x S ^ S) and 
an input w G Sf let 0(tc):= w[l] 0 ... 0 ?ii[|w|]. Then define for j G IN and 

w G Sf igi^ ■“ '^f 1^1 ^ b PP|*j; 0 ) 2:1 ■“ elsewhere. 

Further, let := 0(w[l]) 0 (w[l; 2]) 0 (w[l; 3]) . . . . For easier 
notion define PP{e, 0 ) ■= PP{s, 0 ),s- 

A straightforward analysis leads to the following relation: 

Proposition 1 A function f : Sf Sq is a prefix function iff there exists a 
groupoid {S, 0) and an homomorphism h : S* — > Sq with f = ho PP(^s,®),Si- 



The Average Time Complexity to Compute Prefix Functions 



81 



In order to index the corresponding Moore-machine of a groupoid define: 

Definition 3 For an algebra (if,®), an input alphabet Si C S, and a ho- 
momorphism h : S* ^ Sq define the Moore-machine U 

{go}, ^i7 90) with h{qo) := A. Further, define for all q € SU {go} and 

X € Si: 5 {q,x) := q®x, if q go, and S(qo,x) := x, elsewhere. 

A Moore-machine M will be called minimal for the prefix function / = Tm iff 
for any pair of states gi,g2 C Q holds ^ TM.q^- Analogously, a groupoid 

(A,®) with a homomorphism h is called minimal for / = ft, o PP(i;,0)_i;j if 
for any pair x,y G S there exists a string w G S^ such that h{^{xw)) yf 
ft(®(yw)). Note that the corresponding Moore-machine of ^ minimal 

algebra is also minimal. In the following only minimal algebras and Moore- 
machines will be investigated. To mark the corresponding minimal algebra and 
homomorphism resp. Moore-machine of a prefix function / we write {Sf, 0 /), 
hf, and Mf — {Qf, Sj, Sq, qo,f, Sf,jf):= x'l./i- 

2.2 Confluence and Diffluence 

Using the corresponding Moore-machine Mf of a, prefix function / we will now 
define some basic features, which allow us to analyze the average time of a 
network. Define the reachability set := S{q,S^) as the set of states 

that can be reached by M in exactly t G IM steps starting at state q G Q. Further 
let Rm '= RM,qo- It is easy to see that these reachability sets share the following 
property for all ti,t2 G IN and q G Q: RM,q{ti) = RM,q{t2) iff for alH G IN holds 
RM,q{t\-\-i) = RM,q{t2 + i)- Since there are at most different possibilities for 
Rm.qit). This property implies the existence of numbers r, tt G { 1 , 2 , . . . , 21 *^ 1 } 
with Rm{t) = Rm{t -P i • 7 t) for all t G IN. Let us call the smallest r the 
start of periodicity t{M), and the smallest tt the period 7r(Af) of M. For 
a prefix function / G Iftprefix define Rf\= Rmj, 'n{f)'= T{Mf), and Tv{f):= 

To construct a parallel machine that computes a prefix function / efficiently 
in the average, we will use a search strategy for local substrings, that determine 
the output of that means the tth output position of /, independently from 
the concrete prefix of the input. So we define: 

Definition 4 Let gi,g 2 be two states of a Moore-machine M. A string w G Sf 
with 6 {qi,w) = S{q2,w) is called a confluence of these states. Further, w is 
a confluence for a subset Q' Q if |ft(Q',w)| = 1 . M has a t-confiuence 
if RM{t) is confluent. A t{M)- confluence is called a canonical confluence of 
M. A function f G Iftpi-efix is called t-confluent if Mf is t-confiuent. A Moore- 
machine M (resp. a function f G fFpreUx) is called strictly (t, €)-confiuent if 
each string w G Sf-^ is a confluence of RM{t) (resp. of Mf(t)). Define iFconf 
resp. i?^str-conf as the set of all canonical confluent or strictly (T{f),i)- confluent 
functions f G tFpre&x for some i G IN+ . 

The parameters of a strictly confluent prefix function can be bounded as follows: 
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Lemma 1 Let f G ^str-conf be strictly {t,i)~ con fluent for some t > T{f) and 
£ G IN, then f is also strictly (t(/), \E\'^ / 2) -confluent. 

For easier notion we will call the functions / G ^str-conf strictly confluent. To 
analyze the lower bounds a property complementary to the confluence is needed, 
which is called diffluence. 

Definition 5 Let 91,(72 be two states of a Moore-machine M. A string w G 
is called a diffluence of these states if {(71,(72} = 5{{qi,q2},w). w G 

is called a diffluence of a subset Q' C Q if w is a diffluence for a pair 

(71,(72 G Q' ■ Q' is called strictly diffluent iff each string w G Ef can be 

completed to an diffluence for Q' , i.e. \/w G Ef 3u G Ef : {(71,(72} = 

b{{QiiQ 2 },wu) . M is (strictly) t-diffluent if RM{t) is (strictly) diffluent. 
M is called (strictly) canonical diffluent if M is (strictly) t{M) -diffluent, f G 
•^prefix is called (strictly) t-diffluent, if Mf is (strictly) t-diffluent. Define iFdiff 
resp. i?^str-diff O.S the set of all (strictly) canonical diffluent functions f G 

Figure n gives four examples of prefix functions. The first graph shows the tran- 
sition graph of the carry propagation /carry, that is the basic groupoid of the 
addition. FigureOlb to^d are the transition graphs of the parity function /parity, 
the negation /not, and the prefix function The corresponding groupoid 

{E, (Scarry) of the Carry propagation is given by E\ := E := {pro, gen, del} and 
(E®carry2/ := x, if y = pro, and I (8>carry 2/ := 27, elsewhere. The corresponding 
groupoids for /parity, /not, and can be defined analogously. Investigating 

these transition graphs it can be seen that /carry S 2Fconf-diff := -^conf H J/uff, 
/parity, W € JFgtr-diff, and /not G ^str-conf- A general relationship between 
-Astr-conf, ^conf-diff, and J^str-diff is given in the following theorem: 




Fig. 1. The transition graphs of the Moore-machines of four prefix functions. 



Theorem 1 E prefix ^s a disjunct union of gtr-conf, conf-difF, and J~ gtr-diff"- 



For a function / G .Fconf let cx{f) be the minimum multiple of 7 t(/) such that 
there exists a canonical confluence of Mf of length a{f). For / G TdiS let /3(/) be 
the minimal length of a canonical diffluence of Mf such that /?(/) mod 7r(/) = 0. 
Further, we assume that ( 3 {f) > a{f) if / G .Fconf-diff- 
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For associative operators (g) - like 0carry for the addition - we get r(PP^^ ig,)) < 
1171 and 7r(PP^x',®)) = 1- Furthermore, it holds that 0 !(PP^j; g^) = 1 if PP(i;_ 0 ) G 
J^conf and /3(PP^^_g)) = 1 if PP(i;,g) G 

2.3 Probability Distributions 

To investigate the expected time of a machine we consider families of prob- 
ability distributions ■= ■ ■ ■ with ^Si,n ■ IR where 

fJ‘Si,niw) 0 iff |w| = n. Furthermore, we bound the computational complexity 
of these probability distributions by restricting ourself to those which can be 
approximated by binomial distributions. It is shown in jldSI that this restric- 
tion covers the strictly positive distributions that are described by distribution 
generating circuits or finite irreducible Markov chains. 

Definition 6 Let p G (0; 1). Define [P] as the set of all families of dis- 
tributions p'= 1 p, 1 p) • ■ • with „ p 17j* — > [0; 1], such that for 

all w G 17" > p" and 0 elsewhere. Furthermore it holds for any 

p-distributed random variable W : Vx G 17i Vuu G 17"“^ : Pr[F = 

x\W = uY v]>p. 

To measure the probability of a canonical confluence of a p-distributed 
random variable W we define the rank of confluence of / G iFp^efix and 
W as Pf .= Sf ■ - where Sf is the set of all canonical confluences of 

length a{f) of /. Further define the rank of diffluence of /, W and a pair 
of diffluent states qi,q 2 G i?/(T(/)) as Pf{qi, q^)'— 0 /( 91 , 92 ) • and Pf-— 
maXqj_q 2 Gfl(T(/))F/( 9 i, 92 ) where Of{qi,q2) denotes the set of all diffluences of 
length /?(/) of 9 i and 92 . 

2.4 Networks of Finite Controls 

As the parallel model we will consider networks of finite controls, which are 
closely related to the model of cellular automata: A network of finite control 
(NFC) Nnet:= {K, Kin, E) of degree d is a DAG (K,E) of 

degree d where the nodes are weighted by synchronised finite controls All®! := 
(Q, E, E, A, r, 9o,i). The input nodes of Nnet are given by FfinC K and the 
output nodes of TVnet by ITout^ K. The input of an NFC is given by the 
starting states of the input nodes and the output by the final states of the output 
nodes. The transition function A : Q x E‘^ ^ E of node iGl*! is a function over 
the state of A'l®! and the ’’outputs” of the direct predecessors . . . , /gIhI of 
where ij < ij+i- Let qi,i, ... be the sequence of states of Then 
it holds that for each t > 0: qt+ip = A{qt^i, F{qt,ifi), . . .,^{qt^^^.)) . 

Define the computation time timejvNETC"*^) of an NFC TVnet on input w G 
E* as the minimum t such that either for all output nodes holds qt-i,i = 
qt^i- The output /jvnetC'*^) of TVnet on input w is given by output of the output 
nodes at step timeNNET('a^)- Lor a function / G .^prefix let A4net denote the set 
of all NFCs which compute / on inputs of length n. 
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Like for the circuit model, the domain of a single NFC is bounded to inputs of 
the same length. So we are mostly interested in families of NFCs, where different 
NFCs have a different number of input nodes. A family of NFCs of a function / 
is an infinite sequence A^net> -^net; • ■ • of NFCs with A^net ^ -^net- ^ 

family of NFCs constructible if there exists a polynomial time bounded DTM 
that outputs A^net input 1” for all n G IN. For a prefix function / let A^net 
denote the set of all constructible families of NFCs for /. 

Definition 7 For an NFC A^net input alphabet and a distributions 
FSi,n define the expected time ■= J2wGS"FSi,n{w) ■ 

timeAr»^^(w). For a family of NFCs M := A^jJgrp, A^j^grp, . . . and a family of 
distributions yisi ■= FSi,2, ■ ■ ■ we write etime(Af, = / iff for all 

n G IN holds yisi.n) = f{n) ■ 

Further, we extend this definition of the expected time to sets of families of 
NFCs A4 and sets of families of distributions I? in a natural way. For example: 
Si) < / 3M G M ypisi G V : etime(M, ^i:,) < / 

etime(Ad,I?i;,) > / :4=^ VM G M G T> : etime(M, > / . 

In the following we will show sharp bounds for the expected time behavior of 
NFCs computing prefix functions. The lower bounds follow from some arguments 
based only on restrictions of the data flow. Hence, we can translate these bounds 
also to other parallel models like EREW-PRAMS or PRAM-networks of bounded 
degree. 



3 Preliminaries 



Before we analyze how the existence of a canonical confluence or a canonical 
diffluence influence the expected delay of a NFC in detail, we will show upper 
and lower bounds for the probability that the maximal length of a non-confluent 
substring of a ,8i;;^,„[p]-distributed random variable W passes a given bound. 
More precisely, we will analyze the expected length of pro^(tu, i) := maxj<i{i — 
j\w[j;i] is not a confluence for Rf{j)} and proy,(t(;) := maxjg[p.|^|] proy (w, f). 

For easier notion let llog:= log 2 log 2 . By partitioning the input string into 
blocks of length a{f) we can show: 



Lemma 2 Let W be a p^f^p-distrib. rand, variable with p'^f„p G Bsj^n[p], 
then it holds for any function / G Aconf and for any i, £ with tQ) + £ • a{f) < i: 
Pr[prOf(VF, f) > (t' -I- 1) • a(f)] < (1 — PfY- If further r(/) -I- £ ■ a(f) < n: 
Pr[pro f (W) > {2£- 1) ■ a{f)] < \{n - T{f))/{a{f) ■ £)~\ ■ exp{-£ ■ pf). 



If we partition the input string into blocks of length /?(/) we can analyze the 
maximum length of a diffluent substring: 



Lemma 3 Let f G iFdiS and W be a p^f^ p- distributed random variable. Then 
it holds for e := (Uog 2 n -I- log 2 /?(/) — llog 2 (pjr^)) / log 2 n: 



Pr[proy(IF) > 



(1 



e) -log 2 n / log 2 (p^ • /?(/)] >pl^/l/2 ■ 
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4 Upper and Lower Bounds 

To prove the upper bound of the expected delay of a NFC computing a prefix 
function we will use a network design based on the circuit presented in jl l)j . The 
construction of a single finite controls is based on a technique of Ladner and 
Fischer. In HH they have shown that for any prefix function / resp. for each 
Moore-machine Mf there exists a groupoid {E f,0f) with binary associative op- 
erator (g)/ and two homomorphisms gi,g 2 such that / = g 2 oPP/-^^ 

Using this construction the size oi Ef is exponential in \Qf\ and therefore at 
least exponential in Since both homomorphisms gi and <72 can be com- 

puted by an NFC within constant depth we will focus our analysis on a network 
computing 

To compute a given prefix function / on inputs of length n = 2^ — 2 we 
will use a NFC as illustrated in figure |3 The NFC iV^ET consists of 

n = 2^ — 2 input nodes {a;o, • . . , n output nodes {yo, . . . ,yn-i}, and 
2 k +2 _ 2^ _ 4 . _ 10 internal nodes. The internal nodes are partitioned into 
three types of gates: The upper part, denoted by Ak, consists of nodes with 
0 < J < A: and 0 < f < 2^“^+^ — 3 where every node uf is the root of a complete 
binary tree of height j with leaves Xi. 23 , ■ ■ • Further, it holds that 

:= Xi and u{ := u^ 2 i^‘^f'^i 7 +i i. As a whole the upper part of AI^et i® 

a forest of complete binary trees, in which each height up to k appears exactly 
twice and the heights decrease when going from left to right. The lower part Bk 
consists of nodes wj with 1 < j < A: -I- 1 and 0 < f — 2 where 

vl = 0/ y- 2 i+i = w} 

wl = vl ®f ®f u\~'^ y 2 i = vl . 

If each pair {vf ,wl) is collapsed to a single node the resulting topology of the 
lower part is a collection of complete binary trees, in which each height up to k 
appears exactly once and the heights increase when going from left to right. 

Lemma 4 Let f G .Fconf? then it holds for any n = 2^ — 2 with A: G IM and for 
any input w G Ef that timejvt^^(w) < 3 • log2(pro^(w)) -I- 6 . 

From Lemmata 13 and 0 we can conclude: 

Theorem 2 Let f G IFconi and IV^et ® NFCs that computes f on all inputs of 
length nG [2^“^ — 1; 2^ — 2], then it holds et\Tcie{N^-^r^,Bsi,n[p]) < 3 • Hog n + 
3 • log2Pj ^ + Cf where cj is a constant depending on f . Lf further f G .Fstr-conf, 
then timejyk^^(w) < 6 • log2(|A'|) -|- 3 for all w G Ef. 

In the following we investigate the expected delay of NFCs of degree 2 (the 
general case of degree d follows similarly). Thus, a lower bound for a strictly 
diffluent prefix function / G .Fstr-diff follows directly from the bounded fanout: 

Theorem 3 Let f G .Fstr-diff, then it holds for all n with n > r(/) -|- \Ef \ an for 
a constant Cf depending on f Bsi,n[p]) > 5 • log2 n — I . 

log2P"^ - Cf . 
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Ck-i 



Afc_i 




Bk-i Dk-\ 



Fig. 2. An average case optimal NFCs -/V^et with fc = 4 for input vectors of 
length 14. 



Similarly to Theorem 0 it follows from Lemma 0 

f I 

Theorem 4 Let f € Tdis, then etime(7V4^EXj ^i:i,n[p]) > ^- 4 — (Hogn — c/) 
where Cf := \Qf\ •log 2 P"^ +log 2 log 2 PZ^ + 0 (log 2 /?(/)) • 

Summarizing we can conclude for constant probabilities p - like the uniform 
distribution with P = \ ~ that only three different cases can arise: either a pre- 
fix function can be computed with a constant average delay if / G iFgtr-conf, or 
etime(A4^ET,;Bi;i,„[p]) S 6>(llogn) if / e J^conf-diS, or Bsi,n[p\) ^ 

0{logn) if / G iFstr-diff- Thus, it follows for the examples illustrated in fig- 
ure 0 that the negation /not can be computed within a constant delay, the carry 
propagation /carry with in an expected delay of order Hog n, whereas the parity 
function /parity as well as requires networks with an expected delay of 

0(log n). 



5 The Unbounded Fanout Case 

Investigating the example function from subsection 12. 21 it is easy to see 

that it can be computed by an NFC with unbounded fanout for all inputs in 
constant time. To find a characterization of the prefix function according to the 
average time complexity in processor networks with unbounded out-degree we 
have to examine the structure of a given Moore-machine more carefully. 
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Definition 8 A subset Q' C Q is called a component of M if Q' is a maximal 
strongly connected component of the transition graph Gm- For q G Q let Q[q] 
denote the component that contains q. A component Q' is called closed if any 
state that is reachable from a state of Q' belongs to Q' . Let Q’^ be the set of all 
states of Q belonging to a closed component of M . 

Applying the confluence and diflluence properties to the closed components of 
the transition graph of an Moore-machine we define: 

Definition 9 A Moore-machine M is (strictly) suffix inherent confluent if 

for all q G Rm(t(M)) n Mq is (strictly) canonical confluent, f G ^prefix 
is (strictly) suffix inherent confluent if Mf is (strictly) suffix inherent confluent. 
Define ^si-conf resp. ^si-str-conf os the set of all functions f G d-p^efix such that 
Mf is (strictly) suffix inherent confluent. M is (strictly) suffix inherent dif- 
fluent if there exists at least one state q G Rm{t{M)) n Qfi such that Mq is 
(strictly) canonical diffluent, f G ^prefix is called (strictly) suffix inherent difflu- 
ent if Mf is (strictly) suffix inherent diffluent. Define iFsi-dis resp. i?^si-str-diff 
as the set of all (strictly) suffix inherent diffluent functions f G iFprefix- Further 

define ^ si-conf-diff- — ^ 3i-conf C tF gi-difF- 

For the four examples of subsection 12.21 it holds /carry c -^si-conf-diff, /parity c 
^si-str-diff, and fnot,w[l]'"^'^^ G iFgi_str-conf- Accoi'ding to Theorem ID we can show: 



Theorem 5 F prefix ^s a disj. union of F gi-str-conf; F gi-conf-diff; and F gi-str-difF- 



The lower bounds for fanout unbounded NFCs follow analogously to the lower 
bounds of fanout bounded Networks. Additional to the fanout bounded case we 
have to consider the probability that a prefix of an input maps the starting state 
of the corresponding Moore-machine into a state of a closed component. This 
yields an additional factor of 

To compute a prefix function efficiently in the average, we will use a network 
which method of working can be subdivided into two steps. In the first step we 
will determine the values ruuy'(ic) := min{2^ — 1 | (5(go,w[l;2^ — 1]) G Qm^} 
and qcc '■= runy(r(;)]). This can be done within an expected delay of 

0{p{f)) with p[f):= . exp(pl‘3/l)/(exp(pl‘3/l) _ i) _|_ p-i/2 |Q/| using a 

network design as illustrated in figure 0 In the second step we will a network as 
presented in the previous section. 



Theorem 6 Let f G FpreUx and A^net € Ad net NFC of indegree d then it 

holds 



etime ( A^net , [p] ) 



V log2 d ) 



Hog Tt if f G F gi-conf-difT 
foS2 if f ^ iF gi-str-difF ■ 



Furthermore it holds 

etime(Af;^’ET,^i:i.n[p]) < 



f 3 • Hog n + 0{p{f)) if f G A'si_conf-diff 
1 6 • log 2 \Qf\+ 0{p{f)) if f G A'si-str-conf • 



Hence, the classification of the average time complexity to compute prefix func- 
tions in processor networks is similar in both cases. The unbounded fanout of a 
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input 




output of the component search phase 



Fig. 3. Unbounded outdegree NFC computing run f{w). 



network enables the NFC to distribute some local knowledge to the whole net- 
work within one step. So the existence of a strict diffluence that is only caused 
by a prefix of constant length does not determine the computation time of the 
network completely. 

6 Conclusions 

Our results characterize the average case complexity of prefix functions. They 
allow a significant speedup in the expected delay for two types of networks: net- 
works with bounded and with unbounded fanout. Using the presented algebraic 
properties and network designs we can generate a family of NFCs that achieves 
the shown bounds for all prefix functions. Since the proofs of the lower bounds 
do not make use of the restricted power of finite state machines the results can 
simply be translated to other more powerful parallel models like EREW-PRAMs, 
CREW-PRAMs, and networks of arbitrary powerful control units. On the other 
hand the lower bounds can not be applied to CRCW-PRAMs or other models 
with unbounded fanin like ub-circuits. For example, it is known that the addition 
of two n-bit numbers can be computed be a ub-circuit of polynomial size and 
constant depth. 
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Abstract. We prove that if there is a polynomial time algorithm which 
computes the permanent of a matrix of order n for any inverse poly- 
nomial fraction of all inputs, then there is a BPP algorithm computing 
the permanent for every matrix. It follows that this hypothesis implies 
P’^^ = BPP. Our algorithm works over any sufficiently large finite field 
(polynomially larger than the inverse of the assumed success ratio), or 
any interval of integers of similar range. The assumed algorithm can 
also be a probabilistic polynomial time algorithm. Our result is essen- 
tially the best possible based on any black box assumption of permanent 
solvers, and is a simultaneous improvement of the results of Gemmell 
and Sudan Feige and Lund Ena as well as Cai and Hemachan- 

dra irrmm . and Toda (see 

1 Introduction 

The permanent of an n x n matrix A is defined as 

n 

per(A) = 51 n > 

<y£Sn 1 

where Sn is the symmetric group on n letters, i.e., the set of all permutations of 
{l,...,n}. 

The permanent function has a rich history in combinatorics and in compu- 
tational complexity theory. All known general algorithms computing the perma- 
nent function over the integers take exponential time. In fact, this exponential 
time complexity remains true for any general algorithm over any field of char- 
acteristic other than two. The best known general computational procedure for 
the permanent is a formula due to Ryser, which runs in time 0( n^2”') |Kys63| . 

In terms of computational complexity theory, in 1979, Valiant [Val79j proved 
the seminal result that computing the permanent of integer matrices is com- 
plete for the counting complexity class #P, and is therefore NP-hard. A decade 
later, Toda |Tod89j demonstrated the surprising power of #P; Toda’s theorem, 

* Supported in part by NSF grant CCR-9634665, and by a Guggenheim Fellowship. 
** Supported in part by an NSF CAREER award CCR-9734164. 
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together with Valiant’s result, implies that the permanent is hard for the entire 
polynomial-time hierarchy. 

The permanent has two other fascinating properties that endow it with rich 
computational structure. Lipton observed that the permanent of a ma- 

trix with entries that are low degree polynomials in an indeterminate x is itself 
a low-degree polynomial. In particular, taking linear polynomials in x as entries, 
the permanent of an n x n matrix is a polynomial of degree at most n. It fol- 
lows that the computation of the permanent of a matrix can be reduced, via 
polynomial interpolation, to computing the permanent of uniformly distributed 
random matrices. This property of random self-reducihility has the important 
consequence that the permanent is computationally hard in the worst case if 
and only if it is hard on the average. The permanent also has the downward self- 
reducibility property, which means that computing the permanent of an n x n 
matrix can be reduced (in polynomial time) to the computation of the perma- 
nent of (n — 1) X (n— 1) matrices, via Laplacian expansion. Lund et al. 
applied this property together with the random self-reducibility, to obtain the 
breakthrough result that has interactive proof protocols. Very recently, Im- 
pagliazzo and Wigderson have used these two properties to obtain excit- 

ing new connections between computational hardness vs. randomness. 

The line of research initiated by Lipton |ECT connecting the worst-case 
and the average-case complexities of the permanent was subsequently pursued 
by Gemmell et al. |GLH,SW9H . Gemmell and Sudan Esna, and by Feige and 
Lund It is shown in jGS92j that if there is a polynomial time algorithm 

that can compute the permanent on at least a (1/2) -|- (1/poly) fraction of all 
n X n matrices over the finite field Zp, then there is a probabilistic polynomial 
time algorithm that can compute the permanent of every n x n matrix over Zp, 
provided p is sufficiently large with respect to n. Feige and Lund mna im- 
proved this and showed a similar result under a weaker hypothesis that assumes 
an algorithm that can compute the permanent on at least a (1/2) — Ijn frac- 
tion of all n X n matrices over the finite field Zp. A polynomial reconstruction 
algorithm (aka. Reed-Solomon decoder) of Berlekamp and Welch |BW] is the 
crucial technical procedure in both of these papers. The interactive protocol of 
jLFKNflOj is another crucial ingredient in [Etn2|. 

Another result regarding the computational complexity of the permanent is 
due to Gai and Hemachandra rmj] and independently due to Toda 
(see IAUGhoI L According to this result, if there is a polynomial time algorithm 
such that for every matrix of order n, it can enumerate a list of polynomially 
many values, one of which is the correct value of the permanent, then one can 
compute the permanent of any matrix in polynomial time. 

In this note, we show that if there is a polynomial time algorithm that can 
compute the permanent on at least an inverse polynomial fraction (1/n'’’) of all 
n X n matrices over the finite field Zp, then there is a probabilistic polynomial 
time algorithm that can compute the permanent of every n x n matrix over 
Zp, provided p is somewhat larger than the inverse of the success ratio of the 
assumed algorithm. The proof can be extended to the case where the assumed 
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algorithm is also a probabilistic polynomial time algorithm. We also prove the 
same result for matrices over any finite field, (again assuming that the field 
size is sufficiently large compared to the inverse of the success ratio), and over 
integers (with an appropriate restriction on the length of the entry size and a 
corresponding definition of uniform distribution over such an interval of integers.) 
These results are a simultaneous improvement of the results in KfS92llFn?^ as 
well as in KlHfill (except replacing probabilistic for deterministic polynomial 
time algorithm in the latter). 

We achieve this improvement by applying an improved Reed-Solomon de- 
coder due to Madhu Sudan ISudfitil . building on earlier work by Ar et al. 
IAI,RSfi2l . Our proof uses a family of intermediate matrices whose entries are uni- 
variate polynomials in an indeterminate x; this definition unifies two ideas used 
in previous papers, and makes the proof simple and completely self-contained. 



2 Hardness over Zp 

We begin with a theorem concerning the complexity of permanent over the finite 
field Zp, where p is somewhat larger than the inverse of the success ratio of the 
assumed algorithm. 

Theorem 1. For any constant k > 0, if there exists a deterministic polynomial 
time algorithm A such that for all n and p > A computes the permanent 

of order n over the field Zp correctly on greater than Xjnf fraction of inputs, 
then = BPP. 

Proof. Assume that the success probability of A is q. Then q > \/n^ . Without 
loss of generality we assume fc > 1. Let M be an n x n matrix over Zp whose 
permanent we wish to compute. Let Mu, M 21 , . . . , M„i be the n minors of M . 
Consider the following matrix polynomial, defined by choosing the matrices B 
and C (of dimension n — 1) uniformly and independently at random, 

n 

D{x) = Y, Si{x)M^i + a{x){B + xC), 

i=l 

where each 6i{x) is a polynomial of degree n — 1 such that 



Si{x) 



1 a X = i 

0 if X i and 1 < x < n. 



and Qf(x) is a degree-n polynomial that vanishes for x = 1,2, ...,n, given by 
q;(x) = (x — l)(x — 2) • • • (x — n). Note that D{i) = Mu, for 1 < i < n. 

Let D = {D{x) I X = n -I- 1, n -I- 2, . . . ,p}. The permanent of D{x) is a 
polynomial of degree less than n^. If we know the polynomial per(H(x)) then 
we can compute the permanent of M, by Laplacian expansion. If the matrices 
B and C are chosen uniformly at random, then all the matrices in D are uni- 
formly distributed. Moreover, if the matrices B and C are chosen uniformly and 
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independently, then it can be shown easily that the matrices in D are pairwise 
independent. So the algorithm A has success probability close to g on D also. 
More precisely, the following lemma can be proved by the Chebyshev inequality. 

Lemma 1. The probability (over random choices of B and C ) that the algorithm 
A correctly computes the permanent on more than q/2 fraction of inputs from 
D is at least 1 - ■ 

Proof. For i = n+l,n + 2, . . . ,p, define Zi to be 1 if M succeeds on D{i), else 0. 
We know that the expectation E{Zi) = q iov i = n + l,n + 2, . . . ,p. Define the 

T7- 

random variable Z = — — as the average of the Zfs. Expectation of Z is 
also q. 



Pr[Z < {q/2)\ = Pr[Z - E{Z) < (-g/2)] 
<Pr[|Z-E(Z)|>(g/2)] 

^ Var(Z) ^ 1 

{q/2f ~ {p-n)q^' 

The last inequality is obtained using the facts that ZiS are pairwise independent 
and the variance of a 0-1 random variable is at most 1/4. So with high probability 
A works correctly on more than q/2 fraction of inputs from D. I 

We call a set D (which is defined by choosing B and C) good if, A works 
correctly on more than q/2 fraction of inputs from D. In the following discussion 
assume that D is good. 

For a polynomial /, the Graph{f) is the set {(z, /(z)) | z = 1, 2, . . . ,p}. Define 
a set S as follows. S' = {(z, A{D{i))) \ i = n + 1, . . . ,p}. Consider the set of all 
polynomials / of degree less than such that S intersects the set Graph(f) 
with at least (p — n)q/2 points. The polynomial per(D(a;)) is one among them, 
since A computes the permanent correctly on (p — n)q/2 matrices from D. We 
can prove that there exist at most polynomially (in n) many such polynomials 
/• 

Lemma 2. If p > then there are at most 3/q many polynomials f of 

degree less than that satisfy \Graph{f) H S| > (p — n)q/2. 

Proof. If there exist more than S/g such polynomials, consider a set E oi N = 
[ 3 /( 7 ] polynomials among them. For a polynomial / define the set, Sf = 
{i I (z, /(z)) G Graph{f) n S}. By the inclusion-exclusion principle 



p — n> 



Lls/ 



a Eis/i-i;,,, 

f(^:F 






,\SfPSf, 



> N 



(p — n)q N{N — 1) 



- 1 ) 



The last inequality uses the fact that any two distinct polynomials of degree less 
than rz^ can agree on no more than — 1 points. If = [S/g] then N —1 < 3/q 




94 



Jin-Yi Cai, A. Pavan, and D. Sivakumar 



and N > ‘i/q. Since p > and q > 1/n^, we have p — n > 9(n^ — 

Thus we obtain the following contradiction 



p - n > Y {{P - 'n)q - {N - l)(n^ - 1)) 



> y ( (P - «)? - 



> ^ ( (P - - 



3(n^ — 1) 



3(n^ — 1) 



= p — n + - [ p — n ■ 



q 

9(n^ — 1) 



> p — n. 



I 

Next we show how all these polynomials can be obtained explicitly by a 
randomized procedure with high probability. To handle a minor technical com- 
plication, we will divide this procedure into two cases. We remind the reader that 
we are working under the assumption that the set D is good, that is, the fraction 
of inputs from D for which A works correctly is more than q/2 = l/(2n^). 

(Case 9n^^+^ <P< 161n^^+^): 

In this case, we apply the algorithm A to the matrix D(x) for every x = 
n+ l,...,p. Clearly, this can be accomplished in polynomial time. Since D 
is assumed to be good, A computes the permanent correctly for at least {p — 
n) / (2n^) of these p — n matrices. Letting L = p — n and d = we have a list of 
L pairs {{xj,yj) | j = 1, . . . , L}, such that the Xj’s are all distinct, and moreover, 
there is a polynomial / of degree at most d (namely per(Z?(a;)) ) whose graph 
intersects the list on at least {p — n)/{2rA) places. The condition p > 9n^^+^ 
implies that {p — n)/{2rA) > ^y2{p^^nyiA = V2Ld, for all n. 

(Case p > 161n3'=+2): 

Pick L = 40n^^“'"^ many values x uniformly and independently from the set 
{n + 1,. . . ,p}. For the set of (distinct) x’s produced by this process, produce 
the matrix D{x) and apply the algorithm A to it. Our goal is to argue that with 
overwhelming probability, for at least distinct x’s (out of the L choices 

made), A will give us the correct value of the permanent. (The reason for the 
choice of 161 and 9 with respect to 40 will be clear shortly.) 

Call an x lucky if A computes per(I?(a:)) correctly. We define the events Ej, 
j = 1, . . . ,L, as follows: the event Ej occurs if the j-th random choice of x is 
lucky, and the j-th random choice of x is distinct from all the previously chosen 
lucky x’s. The worst case for Ej to occur is when j = L and all the previously 
chosen x’s were distinct and lucky, thus making it least likely that the jth choice 
from {n -|- 1, . . . ,p} is a lucky x distinct from all previous points. Even in this 
case, there are more than {{p — n)/{2n^)) — L distinct lucky x’s that have not 
been picked, and which, if picked next, would cause Ej to occur. Therefore, for 
every j, and under any condition on the events E\, E 2 , ■ ■ ■ , Ej-\, the probability 
that Ej occurs is at least 
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L _ 1 40n2'=+2 1 _ 1 

p — n 2n^ p — n ~ 2n^ 4n^ 4n^ ’ 

since p > With this estimate we will prove that with high probability 

at least out of the L events occur. 

The event “at least out of L Ej’s occur” is stochastically dominated 

by the event that we obtain at least successes in L Bernoulli trials, each 

having an independent success probability of l/(4n^). This can be seen by per- 
forming the L Bernoulli trials in the following fashion: first, perform the experi- 
ment which defines Ej , where an occurrence of Ej is counted as a success in the 
jth Bernoulli trial; second, for each j, if Ej did not occur, let the jth Bernoulli 
trial be a success with probability l/(4n^) — Cj, where ej is the conditional prob- 
ability of Ej occurring given the previous occurrences (and non-occurrences) of 

£'1, £'2, ■ ■ • , £'i-i- 

Now the probability of the event that “at least successes in L Bernoulli 

trials with success probability of l/(4n^)” can be shown to be very close to one, 
using Chernoff bounds. Therefore, with very high probability we have 
distinct x’s for which the value of the polynomial per(Z3(a:)) of degree less than 
is available. Of course, these x’s and the values of per(D(a;)) are part of a list 
of at most L pairs (xj,yj). Once again, recalling that L = 40n^^“''^ and letting 
d = in?, we notice that 9r?^'^ > V2Ld = 

To summarize, in either case, with high probability, we have a list {{xj,yj)} 
of at most L pairs (for some L which is polynomially bounded in n) such that 
the graph of the polynomial per(Z?(a;)) of degree at most d = intersects the 
list on more than V2Ld places. Given such a list, the following lemma of Sudan 
fSudhtij . building on earlier work by [A1^H,S92) . shows how one can construct all 
the polynomials / whose graphs intersect the list on more than ?2Ld places. 
We note that Sudan’s procedure is based on bivariate polynomial factoring. 
Therefore, it can be implemented in randomized polynomial time in L and d 
and logp, or in deterministic time polynomial in L, d, and p (see [Ka,192j l. 

Lemma 3 ( |Sud96| ) . Given a sequence of L distinct pairs {{xi,yi) | 1 < i < 
L}, where Xi and yi are elements of a field F. Let t and d be integers such that 
t > ?2Ld. Then there is a probabilistic polynomial time algorithm that finds all 
polynomials f of degree at most d such that the number ofi ’s such that yi = f{xi) 
is at least t. 



Sudan’s polynomial reconstruction algorithm is very elegant and simple. For 
the sake of completeness, we will sketch his algorithm here. 

Consider all bivariate polynomials of the form F{x, y) = j aijx’’y^ , where 
i + jd < ?2Ld. Thus there are exactly 



V./2Ljd\ 

= (LV2l7dj + i) + 

3=0 




2 




many coefficients aij. This quantity can be shown to be strictly greater than L 
as follows: 
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Let I = [y^2L/c?J , then y/2L/d = I + x, for some 0 < x < 1. Since [\/2Ld\ + 
1 > y/2Ld, the total sum is strictly greater than 

(^y/2L/d + 1 — V2Ld — — (^y/2L/d + 1 — x^ (^y/2L/d — x^ , 

which can be simplified to L + > L. 

Now if we set F{xi,yi) = 0 for all 1 < f < L, we have a homogeneous 
linear equation system in the coefficients as^t, with more unknowns than L, the 
number of equations, and hence we can find at least one non-trivial bivariate 
polynomial F. If we substitute y = /(x) where / is a polynomial of degree at 
most d and passes through at least t points {xi,yi)^ then we have a univariate 
polynomial F{x, /(x)) of degree at most V2Ld, but vanishes on t > V2Ld many 
points. Hence F{x, /(x)) is identically 0 in F[x]. Thus as polynomials in F(x)[y], 
y — f{x) I F{x,y). But since y — /(x) is monic and in fact belongs to F[x][y], 
y - I y) also in F[x, y]. 

Clearly y — /(x) is irreducible in F[x,y], which is a UFD. In probabilistic 
polynomial time one can factor a polynomial in F[x,y] then y — f{x) 

must be one of the the irreducible factors. 

We return to the proof of Theorem E Thus we have a randomized procedure 
that, with high probability, computes a list of at most 3/g = O(n^) polynomials, 
such that one of them is the permanent of the (n — 1) x (n — 1) matrix D{x). 
The remaining task is to identify the correct polynomial from this list. Two 
ideas come into play here: First, we can find in deterministic polynomial time a 
point V G Zp, such that all 3/q polynomials disagree on v. This is because each 
pair of polynomials can agree on at most points, and there are strictly less 
than such pairs, and p > . Secondly, if we can somehow obtain the 

correct value of per(D(z))), then we can eliminate all but at most one polynomial 
on our list, by cross-checking the values of each polynomial in our list at v 
against the correct value of per(D(x)). Assuming D is good, then with very high 
probability, the correct polynomial per(Z?(x)) is on the list of 0{n^) polynomials, 
thus exactly one polynomial must remain, and the remaining polynomial must 
be the correct per(Z?(x)). Furthermore, D is good with high probability. Once 
we have the correct polynomial, by evaluating the correct polynomial per(Z?(x)) 
at X = 1, . . . , n, we may compute the permanents of the n minors of the matrix 
M that we started with, and thus also compute per(M). 

What we have achieved is a reduction, using A as an auxiliary procedure, 
of the computation of the permanent of n x n matrices to the computation of 
the permanent of (n — 1) x (n — 1) matrices. The reduction is probabilistic, and 
has a high probability of success. In fact, the error probability is bounded by 
the probability that D is not good, which is at most l/((p — n)q^) = 0(l/n^), 
plus the probability that given a good D still we did not get a sufficient number 
of distinct points on which A evaluates correctly, which is exponentially small, 
plus the failure probability of the factoring algorithm, which also can be made 
exponentially small. Thus the overall error probability is 0(l/n^), say c/n^ for 
some constant c > 0. Therefore, if we carry this process through n,n — 1, . . . , K 
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for some large constant K, the total error probability is bounded by (c/j^)) 

which can be made arbitrarily small, say e < 1/2, by choosing a large enough 
K. This implies that we may use the same procedure recursively to compute 
the correct value of per(Zl(u)). The recursion terminates when the order of the 
matrix becomes less than K , and we can compute the permanent directly. 

Finally, to show that = BPP, we need a probabilistic polynomial time 
algorithm for the permanent with negligible (less than any inverse polynomial) 
error probability. We can achieve that by repeating the above algorithm for 
the permanent a sufficiently large polynomial number of times, and taking the 
majority vote. By Chernoff bound, this will succeed with exponentially small 
error probability. I 

3 Some Extensions 

The above proof can be seen to work over any finite field as long as the cardinality 
of the field is at least as large as and also there is a polynomial length 

representation of field elements. 

Theorem 2. Fix any constant k > 0, if there exists a deterministic polynomial 
time algorithm A such that for all n, A computes the permanent of order n over 
the finite field F of characteristic other than two correctly on greater than 1 /n^ 
fraction of inputs, where |F| > and each field element has a representa- 

tion of bit length at most , then P^^ = BPP. 

Similarly we can extend the proof to the case of integers. Admittedly this is 
the most interesting case classically. But our proof will reduce this case to the 
case with a finite field Zp, for an appropriate prime p. 

We must first define properly what is meant to be the uniform distribution 
of integer n x n matrices. For our purposes, we will define simply as follows: 
Consider any bit length i between f?(logn) and Then we consider uniform 

distribution of all integer n x n matrices where each entry of the matrix is an 
integer with absolute value bounded by 2^. 

Theorem 3. For any constant k > 0, if there exists a deterministic polynomial 
time algorithm A such that for all n, A computes the permanent of order n over 
the integers in the interval [—2^,2^] on greater than 1/n^ fraction of inputs, 
where {2k + 3) log 2 n < £ < then P^^ = BPP. 

We prove this by choosing a prime close to 2^, and reason in the finite field 
Zp. We omit the details. 

The proof of our theorems can also be carried out assuming only the existence 
of a probabilistic polynomial time algorithm B with the expected success ratio 
at least inverse polynomial on an inverse polynomial fraction of the inputs. We 
omit the proof here. 

Theorem 4. The same result holds in the above theorems if we assumed the 
existence of a probabilistic polynomial time algorithm only. 
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Our theorems are an improvement of the results in EsnaEina- They also 
simultaneously generalize the results in icmT\ . In this latter result, it is assumed 
that there exists a polynomial time algorithm such that for every matrix of 
order n, it can enumerate a list of polynomially many values, one of which is the 
correct value of the permanent. By taking one entry from such a list at random, 
we obtain a probabilistic polynomial time algorithm with an inverse polynomial 
success ratio. 

In a related development (after this paper was submitted to STAGS), Gol- 
dreich, Ron, and Sudan published a technical report in EGGG [GB.SflSj showing 
that if there is a polynomial time algorithm B that is able to guess the permanent 
of a random n x n matrix on 2n-bit integers modulo a random n-bit prime with 
inverse polynomial success probability, then = BPP. To prove this result, 
they develop algorithmic tools to decode an error correcting code based on the 
Ghinese Remainder Theorem. While their decoding algorithms may be of inde- 
pendent interest, the above result on permanent, which is the main motivation 
for their algorithm, may be proved directly from our results in this paper. 

Specifically, given such an algorithm B, we can show how to produce an algo- 
rithm A that meets the hypothesis of Theorem^ The idea is to randomly choose 
p{n) many n-bit primes for a large polynomial p(n), (by choosing polynomially 
many integers of this size and applying a probabilistic primality test). With high 
probability, we will be able to find sufficiently many primes p such that the algo- 
rithm B succeeds with a fixed inverse polynomial probability in computing the 
permanent modulo p on random nxn matrices. Here “sufficiently many” means 
that the number of such primes, which we will call good primes, is sufficient, via 
Ghinese remaindering, to describe any integer that might be the value of the 
permanent of n x n matrices of 2n-bit integers. Furthermore, by the distribution 
of primes according to the Prime Number Theorem, with high probability, we 
will have a sufficiently many good primes in hand, among all the primes gener- 
ated, which are large enough in value to apply the procedures in the proof of 
Theorem n Of course, we will also have many bad primes where the algorithm 
B fails to achieve such success rate. The main idea is that, through the use of 
the random self-reducibility and downward self-reducibility of the permanent (as 
in the proof of Theorem , for any prime, with high probability we will know 
whether the procedures in the proof of Theorem^ had succeeded or not. The 
essential idea is contained in the LFKN protocol. We can amplify the correct- 
ness probability of this identification of good primes exponentially close to one, 
namely > 1 — for any constant c. Once we have sufficiently many good 
primes identified, we may apply the proof ideas of Theorem ^ and for a given 
matrix M, compute its permanent modulo the good primes, and finally apply 
an error- free Ghinese remaindering to compute the permanent of the matrix M . 

The permanent function has played a pivotal role in complexity theory, e.g., 
see |Lip9U ILFKNflOl IIW98j . Frequently, it is a standard technique to use Yao’s 
XOR Lemma to amplify the unpredictability of some hard function. The theo- 
rems presented here has the interesting feature that, for the permanent function, 
no such amplification is needed if one requires only less than inverse polynomial 
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unpredictability. Perhaps these theorems can be used as an alternative to the 
standard amplification technique. The advantage is that we do not need any 
replication of the inputs. This has been an important concern in some recent 
results due to Impagliazzo and Wigderson mm . 
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Abstract. We demonstrate how to use Lautemann’s proof that BPP 
is in to exhibit that BPP is in p{,pPromiseRP Jjjjjjj 0 (;jia,te conse- 
quences show that if PromiseRP is easy or if there exist quick hitting 
set generators then P = BPP. Our proof vastly simplifies the proofs 
of the later result due to Andreev, dementi and Rolim and Andreev, 
dementi, Rolim and Trevisan. 

dementi, Rolim and Trevisan question whether the promise is necessary 
for the above results, i.e., whether BPP C RP^^ for instance. We give 
a relativized world where P = RP ^ BPP and thus the promise is 
indeed needed. 



1 Introduction 

Andreev, dementi and Rolim fACR,98j show how given access to a quick hitting 
set generator, one can approximate the size of easily describable sets. As an 
immediate consequence one gets that if quick hitting set generators exist then 
P = BPP. Andreev, dementi, Rolim and Trevisan simplify the proof 

and apply the result to simulating BPP with weak random sources. 

Much earlier, Lautemann |La,u83j gave a proof that BPP C — NP'^^, 
simplifying work of Gacs and Sipser |Sip83| . Lautemann’s proof uses two simple 
applications of the probabilistic method to get the existence results needed. 
As often with the case of the probabilistic method, the proof actually shows 
that the overwhelming number of possibilities fulfill the needed requirements. 
With this observation, we show that Lautemann’s proof puts BPP in the class 
j^pPromiseRP[i] quick hitting set generators derandomize PromiseRP 

problems, we get the existence of quick hitting set generators implies P = BPP. 
This greatly simplifies the proofs of Andreev, dementi and Rolim [AGB,98| and 
Andreev, dementi, Rolim and Trevisan IAGRI'<17I . 

* Partially supported by the European Union through NeuroCOLT ESPRIT Working 
Group Nr. 8556, and HC&M grant nr. ERB4050PL93-0516. 
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The difference between RP and PromiseRP is subtle but important. In 
the class RP we require the probabilistic Turing machine to either reject always 
or accept with probability at least one-half for all inputs. In PromiseRP we 
only need to solve instances where the machine rejects always or accepts with 
probability at least one-half. 

A survey paper by dementi, Rolim and Trevisan l(T{!T08l asks whether we 
can remove the promise in our result, i.e., whether BPP C RP*^^. We give 
a relativized counterexample to this conjecture by exhibiting an oracle A such 
that P"^ = RP^ but P"^ yf BPP^. Since virtually all the techniques used in 
derandomization relativize, this means that new techniques will be required to 
collapse BPP in this way. 

2 Definitions 

We assume the reader familiar with the standard notions of Turing machines, and 
deterministic, nondeterministic and probabilistic polynomial-time computation. 
We let U represent the binary alphabet {0, 1}. 

A quick hitting set generator finds strings in large easily describable sets. 

Definition 1. A quick (5-hitting set generator is a polynomial-time computable 
function h mapping 1” to a set of strings of length n such that for all n if 
f : A’" ^ {0,1} is a function computed by circuits of at most n gates and 
Pra;gi;n(/(a;) = 1) > <5 then f{x) = 1 for some x in 

Andreev, dementi and Rolim |ACI{98| show that for any 6, S' > 0, if quick S- 
hitting set generators exist than so do (5'-hitting set generators. We will drop S 
in this case. 

We have many variations of probabilistic complexity classes. In this paper, 
we will concern ourselves with RP, BPP, PromiseRP and PromiseBPP. 

Definition 2. A language L is in the class RP if there exists a probabilistic 
polynomial-time Turing machine such that for all x € S* , 

— If X is in L then Pr(M accepts x) > 1/2, and 

— If X is not in L then Pr(M accepts x) = 0. 

Sometimes the class RP is denoted simply by R. 

Definition 3. A language L is in the class BPP if there exists a probabilistic 
polynomial-time Turing machine such that for all x S E* , 

— If X is in L then Pr(M accepts x) > 2/3, and 

— If X is not in L then Pr(M accepts x) < 1/3. 

Languages in RP require machines M that fulfill the requirements of Defini- 
tion El for all inputs. Sometimes we would like to consider probabilistic machines 
restricted to inputs where the desired requirements hold. We use PromiseRP to 
describe these problems. This does not form a class per se, but we can formally 
define the notions of PromiseRP being easy and oracle access to PromiseRP. 
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Definition 4. We say that a language A is RP -consistent with a probabilistic 
polynomial-time Turing machine M if for all x S S* , 

— X is in A ifPr{M accepts x) > 1/2, and 

— X is not in A ifPr{M accepts x) = 0. 

Note that A may be arbitrary for x such that 0 < Pr(M accepts x) < 1/2. 

Definition 5. We say PromiseRP is easy if for every probabilistic polynomial- 
time Turing machine M there is a set A in P that is RP -consistent with M . 

Using repetition we can reduce the error in Definitions tH5l to 2 for any 

polynomial q. 

Contrast Definition El to Definition El In particular we have PromiseRP 
is easy implies P = RP. The converse is not so simply provable, relativized 
counterexamples easily follow from known results on generic oracles jIHSH]. The 
oracle we develop in Section 0 also gives a relativizable counterexample. 

Definition 6. For any relativizable complexity class C, L is in (jPromiseRP 
there is a probabilistic polynomial-time Turing machine M such that L is in 
for all A RP -consistent with M . 

We can also define cPromiseRPffc] allow only k queries to A in Defini- 

tional We can use the notation PromiseBPP in a similar manner. 

One might want to require in Definition El that L be in via a fixed machine 
depending only on M . Grollmann and Selman show that this restriction 

does not affect Definition El For completeness we give a proof of the equivalence 
of the two definitions in Section El 

It is not hard to see that there is an easy connection between hitting set 
generators and PromiseRP. 

Fact 1 If there are quick hitting set generator then PromiseRP is easy. 

3 One-Sided Promise Gives BPP 

Theorem 1. 

gpp (— p^pPromiseRP[l] 

Proof: We basically use the proof of Lautemann |Lau8d| that BPP is in 
^2 to prove Theorem 0 

Let L be a language in BPP and M a probabilistic polynomial-time Turing 
machine accepting L with an error of 2“” on inputs of length n. Let q{n) be the 
maximum number of coin tosses on any computation path of M on any input of 
length n. Note q{n) is bounded by a polynomial in n. 

Let A be the set of pairs (x,r) such that |r| = g(|a;|) and M(x) using r as its 
random coins will accept. Note that A is computable in deterministic polynomial 
time. We now define the set B as: 

B = {(x,zi, . . I \zi\ = ■■■ = \zq(i^\) \ = g(|a;|) implies there is some 

w G such that {x,w ® z\) ^ A f\ ■ ■ ■ /\ {x,w ® Zq(\x\)) ^ A}. 

Here u® v for |m| = |r;| is the bitwise parity of u and v. 
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Note we have B C NP. First we will show that L is in Our RP® 

algorithm on input x with n = \x\ simply chooses zi, independently at 
random from and then accepts if {x,zi, . . . , Zq(n)) is not in B. 

If a; is in L then consider a fixed w and 1 < t < q{n). The probability that 
{x,w 0 Zi) is not in A is at most 2“". Since the Zi’s are chosen independently, 
the chance that (x,w 0 Zi) is not in A for every Zi, 1 < i < q(n) is at most 
2 -nq(n) ^ Since there are possible w's we have 

Pr((a;, zi,..., Zq(n)) € R) < 2""-. 

Now suppose that x is not in L. Fix z\, . . . , Zq(^n) and i, 1 < i < q{n). If we 
choose w at random, the probability that w 0 is in is at most 2“". The 
probability that w0Zi is in ^ for some i is at most q{n)2~"‘ which for sufficiently 
large n is much smaller than 1/2. Thus for every z\, . . . , Zq(n) of strings of length 
g(n), {x,zi,...,Zq(\^\)) is in B. 

Now we wish to show that L is in RpPromiseRP[i] ^ ^ 

C and B agree on tuples where the w is chosen at random and the acceptance 
probability is either zero or greater than one-half. 

More specifically (x, zi, . . . , Zq(^\x\)) is in C if 

1. \zi\ = ( 7 (|a;|) for each i, I < i < g(|a;|), and 

2. the number of w of length g(|x|) such that 

{x,w (B zi) ^ A A ■ ■ ■ A {x,w (B z^q{\x\)) ^ ^ 

is greater than 

The tuple (a;, zi, . . . , Zq(|a;|)) is not in C if 

1. \zi\ = g(|a;|) for each *, 1 < f < g(|a;|), and 

2. there are no w of length ( 7 (|a;|) such that 

(a;, w 0 zi) ^ ^ A • • • A (a;, w 0 ^ 9 (|x|)) ^ 

The set C can be arbitrary for all other inputs. 

The proof above that L is in RP^^^^ also shows that L is in RP'^t^^. □ 

In the proof of Theorem ^ if a; is in L and the Zi are badly chosen then the 
number of w such that 

(a;, u> 0 zi) ^ A A ■ ■ ■ A {x,w ® Zq(\x\)) ^ A 

might be nonzero yet small. This is why we need PromiseRP instead of just 
RP for this proof. Theorem 0| shows that any relativizable proof would need to 

use PromiseRP. 

From Theorem [I] and its proof we get the following two corollaries. 

Corollary 1. If PromiseRP is easy then P = BPP and PromiseBPP is 

easy. 
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Corollary 2 (Andreev- Clementi-Rolim). If quick hitting set generators ex- 
ist then P = BPP. 

The proof of Theorem Q] only uses the set A restricted to the inputs of the 
form (x,r). Thus we can use PromiseBPP is easy instead of just P = BPP in 
Theorem H and Corollaries Q] and 0 

Andreev, dementi and Rohm |A(;H9dj prove the following stronger result to 
get Corollary 0 

Theorem 2 (Andreev-Clementi-Rolim). For any e > 0, there is a poly- 
nomial-time algorithm that, given access to a quick hitting set generator, and 
given as input a circuit C returns a value D such that 

I Pr {C{x) = 1) - D\ < e. 

We should note that Theorem 0 also follows from Theorem 0 One just need 
notice that distinguishing the possibilities that Prx^i;^{C{x) = 1) > D -\- e and 
Ptx^i:^{C{x) = 1) < D — e is a PromiseBPP question. 

4 RP Can Be Easy without BPP Being Easy 

In this section we show that Theorem 0cannot be improved to show that P = R 
implies P = BPP using relativizing techniques. 

Theorem 3. There exists a relativized world where P = RP yf BPP. 

Define the following function tower(O) = 2, tower(n + 1) = i.e. 

tower(n) is an exponential tower of n + 1 2’s. We will use a special type of 
generic (see [FFKTj93j for an overview) to prove the theorem. 

Definition 7. A BPP-generic oracle G is a type of generic oracle that is only 
defined at length n such that n = tower {m) for some m. Moreover at these 
lengths it will always be the case that at most 1/3 or more than 2/3 of the 
strings of length n are in G. We will call oracles that satisfy these requirements 
oracles that are BPP -promise. 

The oracle that fulfills the conditions of Theorem 0 will be QBF © G for 
G a BPP-generic. Here QBF is the PSPACE-complete set of true quantified 
boolean formulae. The following lemma shows that the second part of Theorem 0 
is fulfilled. 

Lemma 1. Let G be a BPP-generic. pQ^FeG ^ gppQBF©G 

Proof: This follows because G is generic and the condition that P yf BPP 

can be met under the BPP promise of G. □ 

The more difficult part is to show that = p^QBF©G^ need 

the following notion of categoricity. 
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Definition 8. A polynomial time nondeterministic machine M is categorically 
R if for all BPP -promise oracles B it is the case that for all x (x) 

has either more than 1/2 of its paths accepting or none. We will also call these 
machines categorical. 

The idea is to show that if M is categorical then there is a polynomial time 
(relative to QBF) algorithm that computes for all x whether M{x) accepts or 
rejects. The core of this proof will be an argument from Nisan [NisOH . 

The proof of Theorem 0 follows from Lemmas 0 and 0 Lemma |2| says that 
if we have a machine M {x) that is categorically R and we only consider oracles 
A such that at most 1/6 or at least 5 /6 of the strings of length n are in A then 
MQBF®^(2;) can be decided in polynomial time relative to QBF 0 A. 

Lemma 2. Fix an input x and let n = \x\. Let M{x) be a categorical machine. 
For any set A that only contains strings of length n with the promise that either 
at most 1/6 or at least 5/6 of the strings of length n are in A, there exists a 
deterministic strategy that determines M^^^®^{x), querying only a fixed poly- 
nomial number of strings in A. Moreover this strategy can be computed in a fixed 
polynomial time relative to QBF 0 A. 

Proof We follow the lines of the proof of Nisan Suppose M runs in 

time p{n). Call any B that fulfills the 1/6, 5/6 promise BPP 2-promise. Fix A 
to be any BPP 2-promise oracle. 

The deterministic strategy to determine works as follows. 

Let S\ contain all the oracles B such that (x) accepts: 

S'! = {R I Pr(M‘^BF®B^jj,^ accepts) > 0} 

Let So contain all the BPP2-promise oracles such that M{x) rejects: 

S-p = {C I C is BPP2 -promise and Pr(M‘5BFeC(^) accepts) = 0} 

Let Bi a set in ^i. Fix any accepting path tt of M'^BFeSi^^^ with queries 
qi, . . . ,qp(ji) on it and let 6 i,... 6p(„) be such that Ri((?i) = bi. Next query 
qi, . . . , qp(n) to A and let oi, . . . ap(„) be the answers (i.e. Oi = A{qi)). If for all 
i it holds that Oi = bi we know that M*^BFeA^2;) accepts and we are done. So 
assume that this is not the case. 

At this point we have the following claim: 

Claim. For all C S Rq at least half of the computation paths of {x) 

query a string in Q = 91 , . . . , gp(„). 

Proof Suppose this is not true and that there is a C G Sq such that less than 
half of the computation paths of M'^BFeC^jj,^ query a string in Q. Consider 
the oracle C' which is defined as follows. For all x ^ Q, C'(x) = C{x) and 
for qi G Q, C'{qi) = bi. (i.e. C equals C except for the queries in Q where 
it equals Bi). Since C was BPP2-promise it follows that C is BPP-promise. 
Since M^^BFec' (x) has at least one accepting path tt and it is categorical it 
follows that at least 1/2 of its paths are accepting. On the other hand since 
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MQBF®C|' 2 ;) lias no accepting paths and more than half of the computation 
paths do not query anything in Q it follows that less than 1/2 of the paths 
changed and hence that jifQBFec still rejects. A contradiction. □ 

Next adjust So and Si such that they only contain oracles that agree with 
A(gi), . . . , A(gp(„)) and repeat the above construction. It follows that in each 
round we learn the answer to a new query that is queried on at least half of 
the computation paths. Suppose after 2p(n) rounds we have not yet encountered 
a proof that accepts. Either all the queries on all the paths of 

have been queried or the current So is empty. Let E be the set of 
queries made to A in all the rounds. We will have that accepts if 

and only if M'^BFeCAns)^^^ accepting path. 

To choose the set Bi in each round we need remember the oracle queries 
previously made to A. It is not hard to see then that this construction can be 
carried out in PSPACE and reducible to QBE. □ 

Let D be the deterministic strategy that comes out of Lemma El The next 
lemma shows that this strategy also works for BPP-promise oracles. 

Lemma 3. For any ^PP -promise oracle A. Let D he the strategy as described 
in Lemma m D will compute correctly . 

Proof Suppose that D does not compute for some BPP-promise 

A. Suppose that A contains at most 1 /3 of the strings of length n. The case where 
A contains more than 2 /3 of the strings of length n can be handled similarly. 

Suppose D accepted but did not find an accepting path of This 

could only have happened if the final So was empty. Let if be a minimal subset 
of A consistent with D’s queries to A such that {x) rejects. Since Sq 

is empty, E must contain at least ^ strings. Removing any string y from E 
not queried by D will cause M*^BFe(F-{y})^ 2 ;) to accept with probability at 
least one-half. Thus every string in E not queried by D must occur on at least 
half of the computation paths of JVi ‘^bf®f which cannot happen by a simple 

counting argument. 

Thus the only way the strategy can make an error is when D rejects whereas 
MQBF®a^2;^ accepts. Let Q = qi, - ■ ■ ,q 2 p{ny be the queries made by D. and 
let R = n,...,rp(„) be the queries on some accepting path of M'^bfsa^^^^ 
Consider the following set A . For all 9 G Q set A'(g) = A{q), and for r G R 
set A'{r) = A{r). For all the other strings x set A' (x) = 0. It now follows 
that A' contains at most a polynomial number of strings of length n and is 
BPP 2 -promise. Moreover since alQBF®a accepting path it follows 

that mQbf®a' (x) accepts. But since all the queries made by D will be the same 
for A and A' it follows that D still rejects contradicting Lemma El LI 

Proof (of Theorem Ej) By Lemma[Dit follows that pQBF®G ^ gppQBF®G^ 
Let M be any categoric machine that runs in time p{n). let x be any string of 
length I and let m be the biggest m such that tower {m) < p{n). Set n = 
tower (m). Query all the relevant strings in G of length strictly less than n. Since 
G is only defined at lengths that are a tower of 2’s it follows that the previous 
relevant length is so small that one can query all those strings in polynomial 
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time. Next apply Lemma 0 and use QBF to compute The last 

possibility is that happens to be an R machine but it is not categoric. 

This however can not happen since the genericity of G will diagonalize against 
such non-categoric machines. (See EIH3) ° 

Theorem 01 in combination with Theorem Q gives a relativized world where 
PromiseRP is not easy but P = RP. This corollary also follows from work of 
Impagliazzo and Naor fHHH]. 

Heller jHelSB] exhibits a relativized world where BPP = NEXP. One might 
suspect that the techniques of Heller and those used in the proof of Theorem 0 
may lead to an oracle A where P"^ = RP"^ and BPP"^ = NEXP"^. We show 
this cannot happen. 

Theorem 4. In all relativized worlds, if P = RP and NP C BPP then P = 

BPP. 

Proof Zachos |Za,c88) shows that if NP C BPP then NP = RP. We then 
have P = NP = and thus P = BPP. These arguments all relativize. □ 



5 Relativizing to PromiseRP 



Definition El may allow the machine that exhibits L in (jPromiseRP depend 
on A instead of just the underlying probabilistic machine. Grollmann and Sel- 
man jOS88j give a general result that implies that disallowing this dependence 
does not change the class For completeness we give a proof of this 

result. 

For simplicity we will show the equivalence for the class pP‘’°nnseRP^ 
proof works similarly for many other natural classes such as RpP‘’°nnseRP^ 



NP 



PromiseRP 



j^pPromiseRP[l] pPromiseBPP 



Theorem 5 (Grollmann-Selman). For every language L and the following 
are equivalent: 

1. L is in j there exists a probabilistic polynomial-time Turing 

machine M such that for all A RP -consistent with M , there is a polynomial- 
time oracle Turing machine N such that L = L(N^). 

2. There exist a probabilistic polynomial-time Turing machine M and a poly- 
nomial-time oracle Turing machine N such that for all A RP -consistent 
with M, L = L(N^). 

Proof: (2) is more restrictive than (1). We have to show that (1) implies 

(2). Fix L in pP‘’°nn®®P-P ]\/j that witnesses this. 

Let D be the set of x such that M {x) accepts with probability zero or prob- 
ability at least one-half. Let E be the set of x such that M{x) accepts with 
probability at least one-half. We have that A is RP-consistent with M if and 
only if An D = E. 

Let us assume that (2) fails for M, i.e., for every polynomial-time Turing 
machine N there is an A such that AC\ D = E and L ^ L{N^). We will create 
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a set B with B C\ D = E such that for all polynomial-time Turing machines iV, 
L ^ L{N^). This contradicts that fact that M witnesses L in ^ 

Let iVi, A^ 2 j ... be an enumeration of the polynomial-time oracle Turing ma- 
chines. 

We create B in stages, in each stage we give a partial setting of whether some 
strings are or are not in B. Let Bq be the oracle where all strings in E are put 
in Bq and all strings in D — E are put out of Bq. Let mo = 0. 

Our goal at stage i will be to guarantee that for any oracle A extending Bi, 
L ^ L(Nf-). At the end of stage i we will have all strings of length less than rrii 
defined in Bi and only the strings in D of length greater than i will be defined. 
Stage i -I- 1: 

Claim. There exists an RP-consistent A extending Bi such that L ^ L{Nf-). 

Proof: Suppose not. Create machine that simulates except that on 
oracle queries of length less than m^, N will answer them according to Bi. Let 
C be any RP-consistent language. Then will simulate N[ where 

F = {BiC\ u (c n 

Since C fiD = E we have that E extends Bi . By the assumption that the claim 
fails we have L{N’^) = L{Nf) = L. We now have that L{N'^) = L for all 
RP-consistent C contradicting the assumption that (2) fails. □ 

Fix an RP-consistent A and an x such that x G L ^ x ^ L(Nf-). Let rrii+i 
be one more than length the longest oracle query made by Nf-(x) and let Ri+i 
be the extension of Bi where all strings of length less than m^+i are set according 
to A. □ 
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An Optimal Competitive Strategy for Walking 
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Abstract. We present an optimal strategy for searching for a goal in 
a street which achieves the competitive factor of \/2, thus matching the 
best lower bound known before. This finally settles an interesting open 
problem in the area of competitive path planning many authors have 
been working on. 

Key words: Computational geometry, autonomous robot, competitive 
strategy, LR-visibility, on-line navigation, path planning, polygon, street. 



1 Introduction 

In the last decade, the path planning problem of autonomous mobile systems 
has received a lot of attention in the communities of robotics, computational 
geometry, and on-line algorithms; see e. g. Rao et al. d, Blum et al. 0, and 
the upcoming surveys by Mitchell d and Berman 0. 

Among the basic problems is searching for a goal in an unknown environment. 
One is interested in strategies that are correct, in that the goal will always be 
reached whenever this is possible, and in performance guarantees that allow us 
to relate the length of the robot’s path to the length of the shortest path from 
start to goal, or to other measures of the complexity of the scene. 

It is well known that there are some differences between the outdoor setting, 
where the robot has to circumnavigate a set of compact obstacles in order to get 
to the target, and the indoor setting where the obstacles are situated in a — not 
necessarily rectangular — room whose walls may further impede the robot; see 
e. g. Angluin et al. 0- Therefore, it is reasonable to study the indoor problem in 
its most simple form, that is, where the walls of the room are the only obstacles 
the robot has to cope with. 

Suppose a point-shaped mobile robot equipped with a 360° vision system is 
placed inside a room whose walls are modeled by a simple polygon. Neither the 
floorplan nor the position of the target point are known to the robot. As the 
robot moves around it can build a partial map of those parts that have so far 
been visible. Also, it will recognize the target point on sight. 

It is quite easy to see that in arbitrary simple polygons no strategy can guar- 
antee a search path at most a constant times as long as the shortest path from 
start to goal. The question arose if there are subclasses of polygons for which a 

* This work was supported by the Deutsche Forschungsgemeinschaft, grant K1 655/8-3. 
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constant performance ratio can be achieved. To this end Klein |3 introduced the 
concept of streets. A polygon P with two distinguished vertices s and t is called a 
street if the two boundary chains leading from s to t are mutually weakly visible, 
i. e. if each point on one of the chains can see at least one point of the other. 
Equivalently, from each s-to-t path inside P each point of the polygon is at least 
once visible. Klein provided a competitive strategy for searching for the target 
point, t, of a street, starting from s. He proved an upper bound of 5.72 for the 
ratio of the length of the robot’s path over the length of the shortest path from 
s to t in P. Also, it was shown that no strategy can achieve a competitive ratio 
of less than ^/2 « 1.41. This lower bound applies to randomized strategies, too. 

Since then, the street problem has attracted considerable attention. Some 
research was devoted to structural properties. Tseng et al. im have shown how 
to report all pairs of vertices (s,t) of a given polygon for which it is a street; 
for star-shaped polygons many of such vertex pairs exist. Das et al. 0 have 
improved on this result by giving an optimal linear time algorithm. Ghosh and 
Saluja 1^ have described how to walk an unknown street incurring a minimum 
number of turns. 

Other research addressed the gap between the V2 lower bound and the first 
upper bound of 5.72 known for the class of street polygons. The upper bound 
was lowered to 4.44 in Icking | 7 ], then to 2.61 in Kleinberg uni, to 2.05 in Lopez- 
Ortiz and Schuierer El, to 1.73 in Lopez-Ortiz and Schuierer El, to 1.57 in 
Semrau m, and to 1.51 in Icking et al. 0. 

But it has remained open, until now, if -\/2 is really the largest lower bound, 
and how to design an optimal strategy for searching the target in a street; com- 
pare the open problems mentioned in Mitchell m- 

In this paper both questions are finally answered. We introduce a new strat- 
egy and prove that the search path it generates, in any particular street, is at 
most -\/2 times as long as the shortest path from s to t. This result makes the 
street problem one of the few problems in on-line navigation whose competitive 
complexity is precisely known (the only other example we are aware of is the 
result by Baeza-Yates et al. jSj on multiway search). 

One might wonder if this paper is but another small step in a chain of tech- 
nical improvements. We do not think so, for the following reason. Unlike many 
approaches discussed in previous work, the optimal strategy we are presenting 
here is not an artifact. Rather, its definition is well motivated by backward 
reasoning. 

The crucial subproblem can be parametrized by a single angle, (j>. For each 
possible value of ^ a lower bound can be established, see Sect. ft. II For the max- 
imum value (j) = TT the existence of a strategy matching this bound is obvious. 
We state a requirement in Sect. liS.'Zl that would allow us to extend an optimal 
strategy from a given value of (j) to smaller values. From this requirement we can 
infer how the strategy should proceed; see Sect. ESI 

After this work was finished, and made publicly available via the Internet, we 
learned that Schuierer and Semrau HS| have simultaneaously and independently 
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studied the same strategy. However, their analytic approach is quite different 
from our proof. 

2 Definitions and Known Properties 

We briefly repeat necessary definitions and known facts, mostly from |2j. 

A simple polygon P is considered as a room, the edges are opaque walls. Two 
points are mutually visible, i.e. see each other, if the connecting line segment is 
contained within P. As usual, two sets of points are said to be mutually weakly 
visible if each point of one set can see at least one point of the other set. 

Definition 1. A simple polygon P in the plane with two distinguished vertices s 
and t is called a street if the two boundary chains from s to f are weakly mutually 
visible; see Fig.m Streets are sometimes also denoted as LR-visible polygons 0 
El, where L denotes the left and R the right boundary chain from s to t. 

A strategy for searching a goal in an unknown street is an on-line algorithm 
for a mobile system (robot), modeled by a point, that starts at vertex s, moves 
around inside the polygon and eventually arrives at the goal t. The robot is 
equipped with a vision system that provides the visibility polygon, vis(x), for 
the actual position, x, at each time, and everything which has been visible is 
memorized. When the goal becomes visible the robot goes there and its task is 
accomplished. 

Compared to the shortest path, SP, from s to t inside P, it seems clear that 
most of the time a detour is unavoidable. Our aim is to bound that detour. 

Definition 2. A strategy for searching a goal in a street is competitive with 
factor c (or c-competitive, for short) if its path is never longer than c times the 
length of the shortest path from s to t. 

The shortest path from the startpoint s to the goal t inside a simple poly- 
gon P, which only turns at reffesQ vertices of P, is a useful guide for any strategy. 
At each time, either the next vertex on the shortest path to t is known and there 
is no question where to go. Or there is some uncertainty, but we will see that 
only two candidates remain for the next vertex on the shortest path to t. Each 
part of the polygon which has never been visible is called a cave, and each cave is 
hidden behind a reflex vertex. Such a reflex vertex v that causes a cave is called 
left reflex vertex if its adjacent segments on P lie to the left of the ray from the 
actual position of the robot through v, and analogously for right vertices. 

First, we consider the situation at the beginning. From the startpoint s we 
order clockwise around s the set of the left and right reflex vertices, obviously 
they appear in the same clockwise order on the boundary of P. As seen from s, let 
vi be the clockwise most advanced left reflex vertex and Vr the counterclockwise 
most advanced right reflex vertex, see Fig.^J 

A reflex vertex is one whose internal angle exceeds 180°. 



1 
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Fig. 1. Typical situations in streets. 



If they exist, vertex vi belongs to the left chain L, and Vr belongs to the 
right chain R. Only the following situations occur. If both, vi and Vr, exist, see 
Fig.[T|(i), then the goal has to be in one of the caves behind Vr and vi , thus SP 
passes over either Vr or vi. If there is no vertex Vr, see Fig.^](ii), then the goal 
has to be inside the cave behind vi, and the robot moves straight towards vi 
along SP. We proceed correspondingly if only Vr exists. 

To prove these properties assume that w. 1. o. g. the goal is inside the cave 
behind a left reflex vertex vf, see Fig.^(ii), and vf appears before vi clockwise 
from s. Then the boundary of P inside the cave P' behind vertex vi would belong 
to the right chain R. The extension, E{ei), of the invisible clockwise adjacent 
edge, ei, of vi cannot hit the left chain L. Therefore no point inside the cave P' 
can see a part of L, a contradiction to the street property. 

By the same arguments we can prove that the counterclockwise angle 4> > 0 
between svr and svi is always smaller than tt, see Fig.Q(i). Therefore in the 
vicinity of s the robot should always walk into the triangle vi s Vr to avoid un- 
necessary detours to Vr and vi. 

Now we look at the general situation. We assume that a strategy has led 
the robot to an actual position somewhere in the polygon. We will see that the 
properties discussed for the start essentially remain valid. Vertices vi and Vr are 
defined as before, i.e. vi G L is the clockwise most advanced left reflex vertex 
and Vr G R the counterclockwise most advanced right reflex vertex. 

There is no reason for a strategy to loose the current vi or Vr out of sight, 
so we assume that vi and Vr are always visible, as long as they exist. As already 
discussed above, the only non-trivial case is if both, vi and Vr, actually exist. We 
call this a, funnel situation. The angle, 4>, between the directions from the actual 
position to vi and to Vr is called the opening angle, it is always smaller than tt. 
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While exploring P in a funnel situation sequences of reflex vertices vi G 
{vl,vf, . . . , uj"} and Vr G , v^} occur until the funnel situation ends, 

see e.g. point q in Fig.^(i). If at this time only vi = u™ exists (analogously for 
Vr) then we know that the goal t is contained in the cave of vi, we walk to vi, 
and the left convex chain vl vf . . . vf^ belongs to SP. 

So any reasonable strategy will proceed in the following way. If the goal is 
visible or only one of vi and Vr exists, then walk into that direction. Otherwise 
we have a funnel situation, we choose a walking direction within the opening 
angle, i.e. between vi and Vr, and repeat this continuously until the first case 
applies again. 

It is important to note that at the robot’s current position is a vertex of 
the shortest path SP whenever a funnel situation newly appears and when the 
next vertex has been reached after the funnel situation was solved. For example, 
at point q in Fig.n](i) it is clear that we have to go to vertex vf G SP where 
the next funnel situation starts. Therefore, if a strategy achieves a competitive 
factor c in each funnel situation (i. e. compared to the shortest path between the 
two visited vertices of SP) then it achieves the same factor in arbitrary streets. 




Fig. 2. A funnel. 



As a consequence, we can restrict our attention to very special polygons, 
the so-called funnels. A funnel consists of two chains of reflex vertices with a 
common start point s, see Fig. 0 for an example. The two reflex chains end in 
vertices U and tr, respectively, and the line segment Utr closes the polygon. A 
funnel polygon represents a funnel situation in which the goal t lies arbitrarily 
close behind either ti or and the strategy will know which case applies only 
when the line segment ti tr is reached. For analyzing a strategy, both cases have 
to be considered and the worse of them determines the competitive factor. Other 
funnel situations which end with a smaller opening angle or where the goal is 
further away from ti or tr will produce a smaller detour. 

Since the walking direction is always within the opening angle, 4> is strictly 
increasing. It starts at the angle, 4>o, between the two edges adjacent to s, and 
reaches, but never exceeds, 180° when Anally the goal becomes visible. By this 
property, it is natural to take the opening angle (f) for parameterizing a strategy. 
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We can further restrict ourselves to consider only funnels with initial opening 
angle 4>o > 90°. As was shown in ling, any strategy which achieves a factor 
> for all funnels with (f>o > 90° can be adapted to the general case without 
changing its factor in the following way. First, we walk along the angular bisector 
of the current pair vi and Vr until an opening angle of tt/ 2 is reached. Then we 
proceed with the given strategy. 



3 A Strategy Which Always Takes the Worst Case into 
Account 

3.1 A Generalized Lower Bound 

We start with a generalized lower bound for initial opening angles > 90°. For an 
arbitrary angle (j), let := y/1 + sin 4>. 

Lemma 1. Assume an initial opening angle > ^- Then no strategy can 
guarantee a smaller competitive factor than . 

Proof. We take an isosceles triangle with angle (j)o at vertex s and other vertices 
ti and tr- The goal becomes visible only when the line segment U t^ is reached. If 
this happens to the left of its midpoint m then the goal may be to the right, and 
vice versa. In any case the path length is at least the distance from s to m plus 
the distance from m to t/. For the ratio, c, of the path length to the shortest path 
we obtain by simple trigonometry c > cos ^ + sin ^ = \/\ + sin 0o = K<i>o ■ □ 

For (/iQ = §, we have the well-known lower bound of y/2 stemming from a 
rectangular isoceles triangle jOj. Remark also that the bound also applies 
for any non-symmetric situation, since at the start the funnel is unknown except 
for the two edges adjacent to s and it may turn into a nearly symmetric case 
immediately after the start. This means in other words that for an initial opening 
angle (f>o a competitive factor of is always the best we can hope for. 

In the following we will develop a strategy which achieves exactly this factor. 



3.2 Sufficient Requirements for an Optimal Strategy 

In a funnel with opening angle tt the goal is visible and there is a trivial strategy 
that achieves the optimal competitive factor = 1. So we look backwards to 
decreasing angles. 

Let us assume for the moment that the funnel is a triangle, and that we 
have a strategy with a competitive factor of for all triangular funnels with 
initial opening angle (f> 2 . How can we extend this to initial opening angles (j)i 
with TT > 02 > </>i > f ? 

Starting with an angle 0i at point pi we walk a certain path of length w 
until we reach an angle of 02 at point p 2 from where we can continue with the 
known strategy; see Fig. 01 The left and right reflex vertices, vi and Vr as defined 
in Sect. 01 do not change. 
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Fig. 3. Getting from angle (j)i to ip2- 



Let h and I2 denote the distances from pi resp. p2 to vi at the left side and ri 
and T2 the corresponding distances at the right, li t = vi then the path length 
from Pi to t is not greater than w + K^^2- If now K^Ji > w + K^j2 holds and 
the analogous inequality K^-^ri > w + K^^r2 for the right side, which can also 
be expressed as 



w < - K^j2, K^^ri - K^^r2 ) , (1) 

then we have a competitive factor not bigger than for triangles with initial 
opening angle (j)i. 

Note that condition m is additive in the following sense. If it holds for a 
path W12 from (f>i to (j)2 and for a continuing path W23 from ^2 to ^3 then it is 
also true for the combined path ?iii2 + W23 from (f>i to 4>3- This will turn out to 
be very useful: if © holds for arbitrarily small, successive steps w then it is also 
true for all bigger ones. 




Fig. 4. When reaching p2, the most advanced visible point to the 
left jumps from vi to v[ . 



Now let us go further backwards and observe what happens if the current vi 
or Vr change. We assume that condition (^) holds for path w from pi to p2 and 
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that vi changes at p2', see Fig. El The visible left chain is extended by Nothing 
changes on the right side of the funnel, and for the left side of the funnel we have 

w < + I 2 ) — K^^(l2 + I 2 ) ■ ( 2 ) 

The last inequality holds because = y /1 + sin (f> is decreasing with increas- 
ing (p. Here, li -I- I2 and I2 -k I2 are the lengths of the shortest paths from pi 
and p2 to v'l, respectively. But m in fact means that O remains valid even if 
changes of vi or Vr occur. 

Under the assumption that CO holds for all small steps where vi and Vr do 
not change we can make use of the additivity of m and obtain the following for 
the path length, W, from an initial opening angle (po to the point Pg^d where 
the line segment ti U is reached; see Fig. El 

W < min ^ (length of left chain) — 

(length of right chain) — ) 

But, since = 1, this inequality exactly means that we have a competitive 
factor not bigger than It only remains to find a curve that fulfills JQ) for 
small steps. 



3.3 Developing the Curve and Checking the Requirements 

One could try to fulfill condition CD by analyzing, for fixed pi, <pi, and p2, which 
points p2 meet that requirement. To avoid this tedious task, we argue as follows. 
For fixed p2 , the point p2 lies on a circular arc through vi and Vr ■ While p2 moves 
along this arc, the length I2 is strictly increasing while T2 is strictly decreasing. 
Therefore, we maximize our chances to fulfill dO if we require 

K^j2 - K^Ji = K^^r2 - K^-^ri or K^^{l2 - T2) = - ri) . (3) 



In other words: if we start with initial values Iq, tq, we have a fixed constant 
A := — ro) and for any po < p <tt with corresponding lengths Icj, and r^f, 

we want that 

K^{l^-r^)=A. (4) 

In the symmetric case Zq = ^0 this condition means that we walk along the 
bisector of vi and Vr- Otherwise condition defines a nice curve which can be 
determined in the following way. We choose a coordinate system with horizontal 
axis vi Vr, the midpoint being the origin. We scale such that the distance from vi 
to Vr equals 1. 

W.l.o.g. let Zo > vq. For any po < p < the corresponding point of the 
curve is the intersection of the hyperbola and the circle given by 





and 







1 

4 sin^ p 
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Solving the equations gives, after some transformations, the following solutions. 



x(4,) = d . ^ 

2 1 + sin (j) 



1 



Y{(j)) = 2 cot 2 



1 + sin (j) 



1 + tan ^1 — A'^ 



- 1 



( 5 ) 

( 6 ) 



Since A < + sin cj) holds, the functions X{(j>) and Y{(j)) are well defined 

and continuous and the curve is contained in the triangle defined by fo- 

Fig.E| shows how these curves look like for all possible values of (j) and A 
and also for < tq. All points with an initial opening angle of ^ lie on the 
lower half circle. Two cases can be distinguished. For A < 1 the curves can be 




Fig. 5. The curves fulfilling condition Q) for all values of (f> and A. 



continuously completed to an endpoint on the line vi Vr with X(tt) = and 
y(7r) = 0 where also (0 is fulfilled. For A > 1 the curves end up in vi and Vr, 
resp., with parameter (j) = arcsin -\/A^ — 1 < tt. The curves for the limiting case 
A = 1 are emphasized with a thick line in Fig.0 

For lack of space we give only a sketch of the proof that the given curve 
fulfills condition CJ. Because of the additive property of (Ql it is sufficient to 
verify this for very small intervals. The arc length of the curve from angle 4> 
to <t> + e has to be compared to the right side of © . Because of m the min can 
be dropped, and m transforms to 

4>+e 

J a / A '((^)2 + Y'{(j)Y d(j) < for all e > 0 . 

0 



This can now be verified by inserting (jSD and and by making use of well- 
known facts from planar geometry and analysis. 
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To summarize, our strategy for searching a goal in an unknown street works 
as follows. 

Strategy WCA (worst case aware): 

If the initial opening angle is less than 90° walk along the angular bisector 
of vi and Vr until a right opening angle is reached; see the end of Sect.0 
Depending on the actual parameters 0o> and rp, walk along the cor- 
responding curve given by o and (jOI) until one of vi and Vr changes. 
Switch over to the curve corresponding to the new parameters (/>i, h, 
and ri. Continue until the line ti tr is reached. 



Theorem 1. By using strategy WCA we can search a goal in an unknown street 
with a competitive factor of at most \J2. This is optimal. 

4 Conclusions 

We have developped a competitive strategy for walking in streets which guaran- 
tees an optimal factor of at most \/2 in the worst case, thereby settling an old 
open problem. 

Furthermore, the strategy behaves even better for an initial opening an- 
gle 00 > 90° in which case an optimal factor -|- sin 0o between 1 

and is achieved. 

The idea for this strategy comes from the generalized lower bound in Lemma ^ 
and from the two conditions m and o, which are not strictly necessary for the 
optimal competitive factor but turn out to be very useful. Therefore, we do not 
claim that this is the only optimal strategy. It would be interesting if there are 
substantially different but also optimal strategies. 
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Abstract. We consider the problem of a robot searching for an un- 
known, yet visnally recognizable target in a street. A street is a simple 
polygon with start and target on the bonndary so that the two boundary 
chains between them are weakly mntually visible. We are interested in 
the ratio of the search path length to the shortest path length which 
is called the competitive ratio of the strategy. We present an optimal 
strategy whose competitive ratio matches the known lower bonnd of \/2, 
thereby closing the gap between the lower bound and the best known 
upper bound. 



1 Introduction 



A fundamental problem in robot motion planning is to search for a target in 
an unknown environment. We consider a robot with an on-board vision system 
that can identify the target on seeing it. The robot’s information consists of the 
local visibility maps it has obtained so far. Thus the search strategy performs 
on-line, and the method of competitive analysis as introduced by Sleator and 
Tar j an m can be applied to measure its quality. A strategy is c- competitive or 
has a competitive ratio (or factor) of c if its cost does not exceed the cost of 
an optimal solution times c. In our context, the distance traveled by the robot 
must not exceed c times the shortest path length. Competitive on-line searching 
has been investigated in various settings such as searching in special classes of 
simple polygons |2ldffill5lltill7| and among convex obstacles |5] . 

We restrict ourselves to searching in so-called streets which were introduced 
by Klein as the first environment that allows searching with a constant com- 
petitive ratio p. Klein presents the strategy lad which is based on the idea 
of minimizing the Zocal obsolute detour. He gives an upper bound on its com- 
petitive ratio of « 5.71. The upper bound on the competitive factor was later 
improved by Icking to « 4.44 

A number of other strategies have been presented since by Kleinberg 
Lopez-Ortiz and Schuierer [ 11 ‘^11 .‘111 4j and Semrau m - The currently best known 
competitive ratio is « 1.514 jZj. In this paper, we present an optimal strategy 
with competitive ratio \/2 « 1.41. The same strategy was independently discov- 
ered and analysed by Icking et al. |0|. In the next section, we give an outline of 
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how to search in streets and point out the subproblem that must be solved. In 
section 0 an optimal strategy for this problem is presented. 



2 Searching in a Street 

The robot, modeled as a point, is located at the start position s in a simple 
polygon P. It has to find the target t. Both points, s and t, are vertices of P. 
Together with the polygon, they form a street subject to the following definition. 

Definition 1. A simple polygon P in the plane with two distinguished vertices 
s and t on its boundary is called a street if the two boundary chains from s to t 
are weakly mutually visible. 

An example for a street is depicted in Figure ^a). We briefly summarize 
some facts about searching in streets (see |Pll2liS| ~l. Due to the simple lower 
bound example shown in Figure Hb), there is no strategy with a competitive 
ratio less than \/2 0. If a strategy moves to the left or right before seeing t, 
then t can be placed on the opposite side, thus forcing the robot to travel more 
than y/2 times the diagonal. 

Crucial for search strategies is how they behave in funnels, shown as shaded 
areas in Figure CKa). A funnel consists of two reflex chains induced by vertices 
of the street polygon. A reflex chain is a polygonal chain all of whose vertices 
have an interior angle larger than tt. Klein shows that if a strategy achieves 
a competitive factor c in funnels, then it can be embedded in a so-called high 
level strategy to provide a c-competitive strategy for searching in streets 0. 
Outside of funnels, the high level strategy takes the shortest path as depicted in 
Figure 0(a). 

To examine the funnel situation, we introduce some notations. The two re- 
flex chains start in a common point pi and end in the vertices Vn and w„, cf. 
Figure 0(a). A strategy to search in a funnel will know whether to go to Vn or 




Fig. 1. (a) An optimal search path in a street (b) A lower bound for searching 
in rectilinear streets 
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Wn before reaching the line segment VnWn IS)> where we assume that no three 
vertices of the street are collinear. 

For a path point pi in a funnel, we denote the most advanced visible point 
on the left chain by Vi and the most advanced visible point on the right chain 
by Wi- The last path point in the funnel is denoted by Pn+i which equals or 
Wn- The path point before going straight to Pn+i is denoted by 

We define the visibility angle of a point pi to be the angle between the line 
segments from pi to Vi and to Wi having a value of 2ji which is always < tt. The 
visibility angle of pi is called the opening angle of the funnel. 

For a point q in the triangle ViPiWi, we denote the angle between ViPi and Viq 
by aq and the angle between WiPi and Wiq by (3q-, cf. Figure I^Jb). Note that the 
value of the visibility angle of q is 2jq = 2ji + + /3g. Furthermore, we define 

Si ■= 7i — 7r/4, 7 „+i := 7 „, and vq := wq := p\. The length of a line segment 
between two points a and b is denoted by ab and its length by \ab\. 

As proved by Lopez-Ortiz and Schuierer, a c-competitive strategy for funnels 
with opening angle > tt/ 2 can be extended by their strategy clad (continuous 
lad) to a c-competitive strategy for arbitrary funnels jl 4j . Thus, we only consider 
funnels with opening angle > 7 t/2. 

3 The Strategy glad 

We shall define a strategy to traverse funnels by specifying a condition which 
must be fulfilled by two path points pi and Pi+i. In the following, these two points 
are always chosen such that on the path between them (without end-points) no 
new vertex becomes visible. 

3.1 Outline and Correctness of the Strategy 

The strategy, glad (generalized clad), can be regarded as a generalization of the 
strategy clad proposed by Lopez-Ortiz and Schuierer H3I where every two path 
points Pi and Pi+i, i < n, fulfill the condition \piVi\ — \pi+\Vi\ = \piWi\ — \pi+iWi\. 
For a path point pi and a point q with visibility angle 2jq in the triangle ViPiWi 
we define 
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lpi,q ■= \P^v^\9{li) - \QVi\g{jq) ■= \p^q\/lp,,q 

rp.,q ■■= \PzWi\g{-fi) - Iqwilgi-jq) ^ := \p^q\/rp.^q 

where g is the glad function g : [tt/4, tt/2] ^ [0, 1], g{'j) = cos (7 — tt/4). 

The condition for the strategy glad is an extension of the clad condition to 

Ipi^Pi^i — ‘^Pi^Pi^ii I ^ ‘^1 

^Lpi+i = ^kpi+i condition) . 

Two segments of length Ip^^p^j^^ = are depicted in Figure 0 



Wi = iC;+i 




Fig. 3. The points pi and Pi+i fulfill the glad condition 



It is possible to obtain an outline of the glad path if only a few point pairs 
have to fulfill the glad condition; cf. Figure 0 Every point pair with no new 
visible vertex on the connecting path fulfills the glad condition, e.g. point pairs 
(pii P2) and (pi, Pa). Note that if (pi, Pi+i) and (pi+i, Pi+2) fulfill the glad 
condition, then (pi, Pi+ 2 ) also does. 




Fig. 4. A polygonal path passing through a finite number of path points 



Before stating Lemma D we give the domains (if not restricted further in the 
text) of some frequently used variables: 

f < 7 i < l^+l < f 0 < = 7 * - f < f aq + fdq < n - 2-f, . 
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Lemma 1. Let q be a point in the triangle ViPiWi. The segment piq divides the 
visibility angle of pi into 7* and Y Figure^b). Then 



D 



i 

Pi,a 



D 



r 

PhQ 



sin aq 

sin (7^ + aq) cos Si — sin 7^ cos (^Si + 
sin l 3 q 

sin {Y + Pq) cos Si — sin 7’' cos 



Proof. Let the value of the visibility angle of q be 2 "fq. Dividing numerator and 
denominator of Dp^^q by \piq\ and applying the law of sines in the triangle ViPiWi 
yields 

jji ^ ImI ^ sinaq 

IPiVilgYi) — IqVilgYq) sin (tt — 7^— aq) gipfi) — siny^g ^7^ + ^ 

sinog 

sin {Y + OLq) cos Si — sin 7* cos ^Si + 

The proof for D^. q is analogous. □ 



Before analyzing the strategy, we show that the glad path approaches a 
point on VnWn- Therefore, we examine the path in one triangle ViPtWi. In the 
remaining paper we assume that the points pi, Vi, and Wi always satisfy the 
condition \piVi\ < \piWi\. Of course, the analysis is analogous if \piVi\ > \piWi\. 



Lemma 2. The glad path approaches ViWi in a triangle ViPiWi as long as no 
new vertex becomes visible. Every path point pk lies in the interior of the triangle 
ViPiWi to the left of or on the bisector ofpi’s visibility angle (that is, Y ^ Y)j 
and Ip-^p^, = Vpi,pk > 0. 

The proof is omitted due to space limitations. 

Lemma| 3 implies that the glad path approaches a point on VnWn since if a new 
vertex becomes visible on the path in the triangle ViPiWi, then the continuing 
path approaches Vi+iWi+i of the next triangle. Two paths are exemplarily shown 
in Figure 0 




Fig. 5 . Two glad paths 



126 



Sven Schuierer and Ines Semrau 



3.2 Analysis of the Strategy 

We analyze the competitive ratio of the strategy glad by approximating the 
path with polygonal paths. This idea (Lemma 0 and Theorem 01 is taken from 
Lopez-Ortiz and Schuierer PI; Lemma 0 is only extended by the glad function. 
With this lemma, the ratio of the length of a polygonal path (connecting a 
finite number of points pi) to the shortest path length can be estimated by 
regarding every segment PiPi+i and a corresponding portion of the shortest path. 
An example is depicted in Figure 0 

We denote the shortest path in the funnel from a to 5 by S{a,b), the glad 
path by G{a,b), and the length of a path X by |A|. 



Lemma 3. Let Pi, 1 < i < n, be path points in a funnel with opening angle 
> 7t/2 and let g be the glad function. Ifpn+i = Vn and < c orpn+i = Wn 

and Dp. < c, for all 1 < i < n, then the length of the polygonal path through 
the points pi, 1 < i < n + 1, is at most c |iS'(pi,p„+i)|. 

Proof. Let II denote the polygonal path. We have to estimate |77|/|S'(pi,p„+i)| 
and assume Pn+i = Wn- The case for Pn+i = Vn is symmetric. If no new vertex is 
visible in Pi+i, then \wiWi+i \ = 0, else > 0. In either case, \wiWi+i \ = 

\pi+iWi+i \ — |pi+iu>i|, 0 < i < n; cf. Figure 0a). This equation together with 
\piWo\ = 0 and |p„+iw„| = 0 yields 



E n 

= ELl[IP»^»l5'(7») - \P^+lW^\g{'li+l)] - |piWo|5(7l) + IPu+lWulgiln+l) 

= {\P^+lW^+l\ - \p^+lWi\)g{'J,+ l) < Er=oVi'f«*+i I = \S{pi,Pn+l)\ ■ 

By the inequality (ai+ a2)/(6i+ 62) < max {ai/61, a2/&2}> which holds for 61, 
62 > 0, and the above lower bound on |S'(pi,p„_|_i)| it follows that 



\n\ 

\S{pi,Pn+l)\ 



< 



Er=i \p^p^+l\ 



— 



V r 



=1 ' 



< 



max 

l<i<n 



\piPi+i\ 

^Pi,Pi+i 



max D 

Ki<n 



r 

PhPi+i 



= c; 



here, > 0 is required in the second inequality, for 1 < i < n, which is 

ensured by Lemma 0 □ 



If we approximate the glad path by polygonal paths and estimate their lengths 
by Lemma 0 then the optimality of the strategy glad can be proved as follows. 



Theorem 1. The strategy glad is competitive in funnels with opening angle 
> 7t/2. 

Proof. We connect a finite number of points pi of the glad path G(pi,p„) and 
the point Pn+i to a polygonal path. The length of the glad path G(pi,p„+i) is 
the supremum of the lengths of all such polygonal paths. Let Mq, be the set of 
all polygonal paths 77 for which a, a G (0,7 t/ 2), is an upper bound on ap^_^.^. 



An Optimal Strategy for Searching in Unknown Streets 127 



For the supremum, it suffices to consider the polygonal paths from a single set 
Ma since every polygonal path outside this set can be extended to one contained 
in it by inserting additional points. 

To estimate the length of a polygonal path II G by Lemma 0 we first 
show the upper bound c := — sin(a;)) for all values and 

1 < i < n. For i < n, the points pi and Pi+i satisfy the glad condition. We prove 
in Lemma|H|that for two such points the values and are at most 

fy2/(l — sinop^^j implying an upper bound of c on them as < a < tt/2. 

For i = n, we obtain (cf. Figure 0) 



D 



r 

Pn,Pn + 



1 



\PnPn+l\ 

IPnWnlgi'Jn) ~ 0 



\PnWn\ J_ ^ 

\PnWn\g{^u) g{TT) 



Analogously, < c. By Lemma 0 every polygonal path from the set Ma 

has a length of at most c |«S'(pi,p„+i)|. Since every set Ma, a G (0,7t/ 2), yields 
an upper bound on the length of the glad path, we obtain 



|G'(pi,p„+i)| < 



inf — 

Q!G(0,7r/2) 1 



V 2 

— sin(a) 



• |S'(pi,p„+i)| = V2 |S'(pi,p„+i)| . 



□ 



It remains to prove the bound -\/2/(l — sino!p._^J for Dp,p.^^ (= T^p.^p.^J 
if Pi and Pi+i satisfy the glad condition. From now on, we only deal with the 
path point pi+i and write ap.^-^ and without indices. Since pi+i lies in the 
interior of the triangle ViPiWi by Lemma 0 i.e. a and j 3 are positive, we define 
z := ( 3 /a, and z > 0 holds. We need the following function definitions and the 
properties thereafter which hold for the path point Pi+i- 



Definition 2. Let Si := 7 i — tt/4, x G ( 0 , 7 t / 2 ), 7 ^ G [ 7 r/ 4 , 7 r/ 2 ), za > 0. The 
functions T, A, A*, B, B* , and D* , whose (additional) arguments z, 74, and (if 
depending on) a are omitted for the sake of readability, are defined as follows. 



, , cos X — cos + tan Si sin 
T{x) := ^ ^ 



A := 1 - 



T{za) 



B := 



tan7i 
T{za) - T{a) 
sin(27i) 



smx 

and A* := 1 — 
and B* := 2 



tan Si 
tanji 

1 — z tan Si 



1 + z sin(27i) 



D* := 



\/l + tan^7^ 
cos Si + tan 7fyin Si 



The functions A* and B* are closely related to their corresponding limit func- 
tions Aq := lima^o A and Bq := limc^o B. In particular, 

A* = Aq -I- (1 — z)/(2z) • tanfy/ tau7i and B* = Bq ■ Az /{1 + z)^ . (1) 
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The function D* results from transforming by 

sin(y + x) cos 5i — sin j/cos(i5i + = sin a: cos (5^ [cosy + T{x) siny] (2) 

to . — 

jji ^ I ^ yi + tan^7^ 

Pi.Pi+i cos(5i (cos7^+ r(o!) siny^) cosJi (1 + r(o;) tan7^) 

and replacing T{a) by its limit function i^tanJi for a — > 0. 

Lemma 4. Let pi and Pi+i be two path points with 7^+1 > 7^, and let Si := 
7i — 7 t/ 4. The segment PiPi+i divides the visibility angle of pi into 7^ and 7’'. 
Then, with the funetions from Definition]^ 

(a) zG(0,1] 

(b) tan7* = Atan7 i/(A + S) 

(c) A + B>0 and ^>0 

(d) A< A* and B> B* 

(e) D*/{l-sma) . 

Proof, (a): We have already stated z > 0. The proof for the upper bound, z < 1, 
is by contradiction. Assume z > 1. We show that this implies ^ 

hence z must be < 1. The assumption z > 1 is equivalent to /3 > a and yields 
for the expressions from Lemma d 

1/^kpi+i > cos(7'- Si) and < cos(7’'- 5,) . 

By Lemma El we have 0 < 7^ < 7^ and 7i < 7^^ < 2"fi, for which the inequality 
cos(7^— Si) > cos{Y~ ^i) easily can be shown. Hence, 

(b): Starting with and solving for tan 7* yields the claim by 

using Y = 27 i — 7^ and lEJ. 

The proofs of (c)-(e) are based on T{a) < limo,^o7"(a) = tan Si and 
T{za) > limQ.^0 T(za) = tanJi which can easily be shown. From this follows 
that A + B > 1 — T{za) + T{za) — T{a) > 0, A < Aq, and B > Bq > 0, where 
Aq and Bq again denote the limit functions of A and B for a ^ 0. Aq < A* 
and Bq > B* follows from m and Ho > 0 and proves (d). From A + B > 0, 
tan 7* > 0, and Lemma Etb), we obtain A > 0. The proof of (e) is omitted due 
to space limitations. □ 

Now we prove an upper bound for (= Hp^p.^J, pi and pi+i being two 

points satisfying the glad condition. Replacing the occurrences of 7^ in 
via Lemma d^b) yields as a function of a, z and 7^. A numerical analysis 

of this function (with respect to the conditions A > 0 and z G (0, 1] from 
Lemma EJ suggests that the function achieves its maximal value for z = 1. 
In the following analysis of the function, we show the upper bound '\/2/(l — sin a) 
which is sufficient for our purpose. 
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Lemma 5 . For two points pi and Pi+i satisfying the glad condition, D^. = 

V2/(l-sina). 

Proof. Because of Lemma Me), it suffices to show that D* < \[ 2 . Let fi and /2 
denote the numerator and denominator of D*, respectively. We use A + B > 0 
from Lemma itc) and prove {A + ( 2 /| — ff) > 0 which implies 2 > ff/ff, 

hence \/2 > /1//2 = D* . In addition to the functions fi and /2, we define the 
two functions 

Q(7i, z) := (1 - z) (1 - tan 7^) 

R{li,z) := 2sin(27j) - (l + z) cos(27^) tan7i 

Since — cos(27i)/sin(27i) = (tan^7i — l)/(2tan7i) and z < 1 , 

B* ■ z) > (1 - z) tan < 5 * (l + tan^7i) . ( 3 ) 

Furthermore, since tani 5 i = (tan7i — 1)/(1 + tan7i), 

A* tan7i (1 — tan7^) = tanJ^ (— tan^7i — l) . ( 4 ) 

Now, using cos 6 i = (cos7i +sin7i) /\/ 2 , sini 5 i = (sin7i — cos7i) /-\/2 and re- 
placing tan 7* via Lemma 0b), we obtain that {A + B)^ ( 2 /| — ff) equals 

{A + i?)^(cos7j -I- sin 7*)^ -|- 2 {A + B)(sin^ 7i - cos^ 7i)(l -|- z )/2 • Atan7i -|- 
(1 -I- z )^/4 • A^tan^7i(sin7i — cos7i)^ — (A -|- B)^ — A^tan^7i . 

If we subtract 25 ^ siu7i cos7i, use (I-|-z )^/4 > z, and simplify the resulting 
expression, we obtain 

{A + Bf (2/2 - /i ) > A^ ■ tan 7, Q(ji, z) + AB ■ i?(7,, z) ; 

since Q(ji, z) < 0 and i?(7i, .z) > 0 because of z < 1, tau7i > 1 and cos(27i) < 0, 
we obtain by Lemma 2 ])c) and (d) 



> A{A* tan7i • Q{ji,z) + B* ■ i?(7i,z)) 



and use © and to finally get 

> A {1 — z)[A* tan7i(l — tany^) -|- tani 5 i(l -I- tan^y^)) = 0 . 

□ 

Theorem n completes the proof of Lemma 0 the optimality of the strategy glad. 
Together with the lower bound v^, we have the following result. 

Theorem 2 . For robots with vision system, the strategy glad, combined with 
the strategy clad and embedded into the high level strategy, is a ^/ 2 - competitive 
strategy for target searching in streets, and this is optimal. 
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4 Conclusions 

We have solved the target searching problem for streets by presenting an optimal 
strategy for funnels. The strategy consists of two parts and changes from clad to 
glad upon reaching the visibility angle tt/2. Regarding the strategy and the path, 
two questions arise. Is it possible to traverse a funnel optimally without changing 
the strategy? And is it possible to traverse a funnel with a path consisting 
only of line segments? Concerning the second question, the strategy clad can be 
replaced without losing optimality by following the bisector of the first visibility 
angle until a new vertex becomes visible and repeating this until the visibility 
angle tt/2 is reached 1 1 1 1 1 isj . For the remaining part of the funnel, Lopez-Ortiz 
already shows that an optimal path of line segments cannot be obtained if the 
robot changes its direction only on seeing a new vertex m It is questionable 
if optimality can be achieved here with a finite number of additional direction 
changes. 
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Abstract. We investigate parallel searching on m concurrent rays. We 
assume that a target t is located somewhere on one of the rays; we are 
given a group of m point robots each of which has to reach t. Further- 
more, we assume that the robots have no way of communicating over 
distance. Given a strategy S we are interested in the competitive ratio 
defined as the ratio of the time needed by the robots to reach t using S 
and the time needed to reach t if the location of t is known in advance. 

If a lower bound on the distance to the target is known, then there is a 
simple strategy which achieves a competitive ratio of 9 — independent 
of m. We show that 9 is a lower bound on the competitive ratio for two 
large classes of strategies if m > 2. 

If the minimum distance to the target is not known in advance, we show 
a lower bound on the competitive ratio of 1 -I- 2(fc -|- where 

k = [logm]. We also give a strategy that obtains this ratio. 

1 Introduction 

Searching for a target is an important and well studied problem in robotics. In 
many realistic situations the robot does not possess complete knowledge about 
its environment, for instance, the robot may not have a map of its surroundings, 
or the location of the target may be unknown 0, 0J 0 0 0 IH3 [Q [01 [01 [0| • 
The search of the robot can be viewed as an on-line problem since the robot’s 
decisions about the search are based only on the part of its environment that 
it has seen so far. We use the framework of competitive analysis to measure the 
performance of an on-line search strategy ^ m- The competitive ratio of S is 
defined as the maximum of the ratio of the distance traveled by a robot using S 
to the optimal distance from its starting point to the target, over all possible 
locations in the environment of the target. 

A problem with paradigmatic status in this framework is searching on m 
concurrent rays. Here, a point robot or — as in our case — a group of point robots 
is imagined to stand at the origin of m concurrent rays. One of the rays contains 
the target t whose distance to the origin is unknown. A robot can detect t only 
if it stands on top of it. It can be shown that an optimal strategy for one robot is 

* This research is supported by the DFG-Project “Diskrete Probleme”, No. Ot 64/8-1. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 132-^^21 1999. 
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to visit the rays in cyclic order, increasing the step length each time by a factor 
of m/{m — 1) if it starts with a step length of 1 mill. The competitive ratio Cm 
achieved by this strategy is given by 

to "* 

The lower bound for searching in m rays has proven to be a very useful tool for 
proving lower bounds for searching in a number of classes of simple polygons, 
such as star-shaped polygons ca, generalized streets PII5, HV-streets 0, 61- 
streets 13 El, and fc-spiral polygons CHI. 

In this paper we are interested in obtaining upper and lower bounds for the 
competitive ratio of parallel searching on to concurrent rays. 

Assume that a group of to point robots searches for the target. Neither the 
ray containing the target nor the distance to the target are known. Now all the 
robots have to reach the target and the only way two robots can communicate 
is if they meet, that is, they have no communication device. Baeza-Yates and 
Schott investigate searching on the real line, that is, the case to = 2 0 . They 
present two strategies both of which achieve a competitive ratio of 9. They also 
consider searching for a target line in the plane with multiple robots and present 
symmetric and asymmetric strategies. However, the question of optimality, that 
is, corresponding lower bounds, is not considered. 

In this paper we investigate search strategies for parallel searching on to 
concurrent rays. If a lower bound on the distance to the target is known, then 
there is a simple strategy that achieves a competitive ratio of 9 — independent 
of TO. We show that even in the case to = 2 there is a matching lower bound of 
9 on the competitive ratio of two large classes of strategies. Moreover, we show 
that, for all strategies, a lower bound of 9 for to = 2 implies a lower bound of 9 
for TO > 2 — as is to be expected. 

If the minimum distance to the target is not known in advance, then we show 
a lower bound on the competitive ratio of 1 -I- 2{k + where k = [log to] . 

We also present a strategy that achieves this competitive ratio. 

The paper is organized as follows. In the next section we present some defini- 
tions and preliminary results. In particular, we present three strategies to search 
on the line (to = 2), each with a competitive ratio of 9 and show that one of 
them can also be used to search on m rays with the same competitive ratio. In 
Section IHI we show a matching lower bound of 9 for strategies that are monotone 
or symmetric. Finally, in Section ^ we present an optimal algorithm to search 
on TO rays if no lower bound on the distance to the target is known in advance. 

2 Preliminaries 

In the following we consider the problem of a group of to robots searching for 
a target of unkown location on to rays in parallel. The robots have the same 
maximal speed which we assume w.l.o.g. to be 1 distance unit per time unit. If 
the robots have unbounded speed, then the time to find the target (both off-line 
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and on-line) can be made arbitrarily small. The speed of a robot may be positive 
(if it moves away from the origin) or negative (if it moves towards the origin). 

Let S' be a strategy for parallel searching on m rays and Ts{D) the maximum 
time the group of robots needs to find and reach a target placed at a distance 
of D if it uses strategy S. Since the maximum speed of a robot is one, the time 
needed to reach the target if the position of the target is known is D time units. 
The competitive ratio is now defined as the maximum of Ts{D)/ D, over all 
Z? > 0. In some applications a lower bound Dmin on the distance to the target 
may be known. If such a lower bound exists, then we assume without loss of 
generality that Dmin = 1. It will turn out that the existence of Dmin leads to a 
drastically lower competitive ratio if m > 2. 

We define different classes of possible strategies to search on m rays in par- 
allel. We say a strategy is monotone if, at all times, all the robots (that do not 
know the location of the target) have non-negative speed. We say a strategy 
is full speed if all the robots travel at a speed of 1 or —1 at all times. We say 
a strategy is symmetric if, at all times, all the robots (that do not know the 
location of the target) have the same speed. 

We illustrate the different types of strategies for m = 2. The optimal mono- 
tone strategy is for each robot to travel at a speed of 1/3 on each ray. After 
one robot has found the target, it runs back to fetch the other. This leads to a 
competitive ratio of 9. This strategy is described in |2|. In the next section we 
show a lower bound of 9 on the competitive ratio of monotone strategies. The 
optimal (full-speed) symmetric strategy is for each robot to double the distance 
that has been explored before and then to return to the origin. This strategy 
can only be applied if a lower bound on the distance to the target is known. It 
achieves a competitive ratio of 9. Again this strategy is described in 0 and we 
show a lower bound of 9 on the competitive ratio of symmetric strategies in the 
next section. Finally, an asymmetric strategy is for both robots to walk together 
and to use the optimal strategy for one robot to search on two rays. This again 
yields a competitive ratio of 9. 



3 With Lower Bound on the Minimum Distance 

In this section we assume that a lower bound on the distance from the origin to 
the target of Dmin = 1 is known. 

Initially we study the parallel search problem on two rays and prove a lower 
bound on the competitive ratio for monotone strategies. We can view the two 
rays as being the positive and negative parts of the real line with the two robots 
initially placed at the origin. As time passes the robots move continuously and 
monotonically with some speed along the line until one of them finds the target. 
This robot now travels at full speed to the other robot and communicates to it 
the location of the target and they both return to this target point. 

Let v\{T) and V 2 (T) be the average speeds of the two robots at time T, i.e., 
the distance of the robot to the origin at this time divided by the time. It is clear 
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that a search strategy is completely specified by the two average speed functions. 
We can prove the following lower bound. 

Lemma 1. There is no monotone strategy that achieves a better competitive 
ratio than 9 to search on two rays in parallel. 

Next we look at symmetric strategies. It is easy to show that only symmetric 
full speed strategies need to be considered. In the following we show a lower 
bound for symmetric strategies. We start with the simpler case when m = 2 and 
consider the general version later. 

Lemma 2. Let Z he the set of infinite positive sequences Z = (zq, Zi, Z2, ■ ■ ■) 
with Z 2 k > Z 2 k -2 and Z 2 k+i > Z 2 k-i, for all k>\. If S is a symmetric strategy, 
then the competitive ratio Cs of S is at least 

inf supl + 2i^fc(Z) 

•Z'62 



where 



Fk{Z) = max 



Z2k + Z2k+1 
Z2k-1 



Z2k+2 + Z2k+3 
Z2k+2 — Z2k 



Proof. Since the strategy is symmetric, the two robots will use the same local 
strategy to search its own ray. Furthermore, as we mentioned above we can 
assume that it is a full speed strategy. We can model a full speed strategy for 
a robot by saying that it first moves a distance xq forward along the ray at full 
speed, then it moves a distance yo backwards at full speed, then a distance xi 
forward, a distance yi backward, and so on. When one of the robots finds the 
target, it runs back at full speed until it meets the other robot, and they both 
run to the target at full speed. 

The proof uses an adversary to place the target point in order to maximize 
the competitive ratio. 

We say that a robot is in step k when it moves forward and backward the 
k + 1®* time. Let Lk denote the distance to the origin of the turning point where 
the robot begins step k and let Uk denote the distance to the origin of the turning 
point where the robot starts to move backwards during step k. We have that 



fc-i 

Lfc — Lk—l Xk—l yk—1 — ^ ^ yi', 

i=0 

k—1 k 

Uk = Lk + Xk = Xk + '^x^-y^ = yk + '^Xi-yi. 

i—0 i—0 



The total time that the robot has travelled when it completes step k is 

k 

Tk = -\-yj. 

2=0 
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First of all, we can assume that yk < Uk for all fc > 0, that is, a robot always 
stays on the same ray, since if this does not hold, we can exchange the strategy 
for an equivalent one where, if the two robots meet at the origin, they exchange 
places and continue on their own ray instead of the other robot’s ray. 

Secondly, we can assume that Uk-i < Uk, since otherwise, the strategy will 
not explore any new part of the ray during step k, and we can exchange the 
strategy for another equivalent one, where the assumption holds. In particular, 
we can assume that yk < Xk+i, for all fc > 0. 

Assume that the target is placed at distance D with 1 = Dmin < D and 
Uk-i < D < Uk, for some fc > 0. This means that one of the robots will find 
the target during step fc. If Tr again denotes the time for the robot that found 
the target to reach the second robot at full speed, then the competitive ratio for 
this placement is given by 



^ Tk-i + {D-Lk) + 2Tn , _ EIo + Tr 
Co = ^ = 1 + 2 . 

We will only consider two possible placements for the target and let the 
adversary choose the one that maximizes the competitive ratio. For the first 
placement of the target, we assume that the other robot is reached while it is 
still in step fc. (This is the best case for the strategy.) Since we place the target 
in the interval ]Uk-i, Uk], the ratio Cl of the time needed by the strategy to the 
optimal time is given by 



Cl 



sup 




E k — 1 , rri 

i^O Vi + 

D 



1 + 2 



Vi + Ck 
Uk-1 



A 






B 






• • • • • 

B g Lk+i D Uk 



Fig. 1. Robot A misses robot B if D + Lk+i > Uk — D + Uk — Tfc+i- 



In the second possible placement the robot that finds the target just fails to 
reach the other robot during step fc. Since both robots travel at full speed, the 
earliest they can meet is during the step fc + 1. In order for robot A to miss 
robot B the distance D + Lk+i from the point where A finds the target to the 
forward turning point of B at the beginning of step fc + 1 has to be larger than 
the distance from the point D on B's ray to B's backward turning point at Uk 
plus the distance from Uk to Lk+i (see Figure QJ, that is. 



Lk+i + D > Uk — D + Uk — Tfc+i — Uk — D + yk. 
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i.e., when D > yk- The placement of the target is therefore restricted to the 
interval ]yk,Uk], which implies that the adversary places the target right after 
yk ■ The ratio of the time needed by the strategy to the optima is given by 

C'l= sup + yi + Tn ] ^ ^^^ J 2i=oV^ + Uk+i ^ 

D&]yk,Uk] i ^ J 



It may happen that independently of where the target is placed in the interval 
]Uk-i, Uk] the robot that finds the target will never miss the other robot during 
step k. This means that yk > Uk because placing the target on the point Uk and 
requiring that the other robot is met during step k implies that the two robots 
meet at the origin. On the other hand, we know from before that yk < Uk, and 
hence, that yk = Uk- 

Now, we consider the situation if the adversary places the target right after 
the point Uk- We obtain a competitive ratio of 






Uk 



Vk 



= Cl 



Hence, is a lower bound for the competitive ratio independent of the 
fact whether the robot that finds the target can miss or cannot miss the other 
robot in step k- Hence, the best competitive ratio for any symmetric strategy is 
bounded below by 



C = supmax{C^,C'f}. 

k>0 



If we make the variable substitutions Z2k = and z^k+i = Uk+i, for 

all A: > 0 , and let Z = {zq, zi, . . .), we can express maxIC^, C^}, fc > 2 as the 
functional 

Z2k-2 + Z2k-1 Z2k + Z2k+1 1 
Z2k-3 ’ Z2k — Z2k-2 J 

where Z2i > Z2i-2 and Z2i+i > Z2i-i, for all i > 1 . This proves the claim. □ 

Now we show the main result on symmetric strategies. 

Theorem 1. There is no symmetric strategy that achieves a better competitive 
ratio than 9 to search on two rays in parallel- 

Proof- (Sketch) By Lemma |5| the competitive ratio Cs of a symmetric search 
strategy S satisfies Cs > inf^g^ supj,>i 1 + 2Fk{Z)- Let a = lim„^oo(^n)^^"'- 
Using methods by Schuierer Id it can be shown that there exist two positive 
numbers 70 and 71 such that 

sup Fk{Z) > sup Fk (70 , 71 a, 7oo^ , . . .) 

0<fc<oo 0<fc<oo 



1 + 2Fk-i{Z) = 1 + 2 max 



where 70,71 > 0 . Let 7 = 71/70; then. 
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^’fc(7o,7ia>7oa^, ■ ■ •) = 



r d 1 

1 



+ 7a3 ] 
o?-\ ] 



which is minimized for 7 = a — 1/a and a = \/2 yielding a value of 4, that is, 
supo<fc<oo 1 + ‘^Fk{Z) > 9, for all Z G Z as claimed. □ 



Theorem 2. There is no monotone or symmetric strategy that achieves a better 
competitive ratio than 9 to search on m rays in parallel. 

Proof. We consider monotone strategies first. It is obvious that if the robots 
have further information about the location of the target, then the competitive 
ratio for a strategy that exploits this information does not increase. Assume 
that the robots know that the target is on one of the two first rays. They can 
all explore these rays monotonically in common. Consider now the strategy we 
get by taking the furthest robot from the origin on each of the two rays. This 
strategy is a monotone strategy for two robots on two rays and by Lemma Q no 
such strategy can do better than a competitive ratio of 9. 

For symmetric strategies we can argue in a similar manner. Once a robot has 
found the target, the competitive ratio is bounded below by the time it takes 
to fetch all the other robots and to go back to the target. This ratio is bounded 
below by the time it takes for the robot that found the target to fetch one other 
robot and for it to go to the target. By Theorem ^ this competitive ratio is 
bounded below by 9. □ 



Now, there is a strategy that achieves a competitive ratio of 9 to search on 
m rays in parallel. The strategy is known as the doubling strategy and goes as 
follows. Each robot starts by going one unit at full speed on its ray and then goes 
back to the origin. Then they each go two units, four units, and so on, on their 
corresponding ray, always doubling the distance travelled and repeatedly going 
back to the origin. Once a robot finds the target, it goes back at full speed to 
the origin and waits there until the other robots reach it. It then communicates 
the location of the target to the other robots and they all move at full speed to 
the location. The competitive strategy of the doubling strategy, if the target is 
at distance D from the origin is 



C < sup 

De]2'=-1,2''] 



2Eto2* + C 



D 



< 1 



sup 

De]2*:-1,2''] 



2Eto2^ 

D 



2 Eto2* 



2fc+i _ I 



< 1 + ~ = 1 + 2 ^ = 9-:^t:^7 < 9. 



2 fc— 1 2 ^“^ 

We have proved the following theorem. 



2 fc -2 



Theorem 3. The doubling strategy achieves a competitive ratio of 9 to search 
on m rays in parallel given a lower bound on the distance to the target. 
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4 Without Lower Bound on the Minimum Distance 



In this section we consider the problem of a group of m robots searching for 
a target of unkown location on m rays in parallel and no lower bound on the 
distance from the origin to the target is known. 

We begin by presenting a strategy that achieves a competitive ratio of 



1 + 2 



(fc+ 1)^+1 



where k = [log m] . We then show that, in fact, no strategy can do better. 



4.1 The Strategy 



The optimal strategy is a monotone strategy where all the robots move, one on 
each ray, with a constant speed v. When one robot finds the target it searches 
for a robot at full speed to tell it where the target is located. Then they both 
go at full speed to search for two more robots and tell them the location of the 
target, and so on. After each step the number of robots that know the location 
of the target is doubled. Once all robots know the location, they all move to the 
target. Suppose the target is on some ray and at distance D from the origin. 
The strategy consists of steps. Step i starts when 2* robots know the location to 
the target and ends when 2*+^ robots know the location to the target; that is, 
in step i the 2® robots that currently know the position of the target chase 2* of 
those robots that do not. Let Ti denote the time it takes to complete step i. It 
takes any of the robots D/v time to find the target, and when all robots know 
the location of the target, it takes them time Tp to go to the target. Hence, the 
competitive ratio of the strategy is 

^ _ D/v + EtoT^ + TF 



where k = [log m] . 

We can show that 






D 



= 1 + 2 — if the speed is set to 



2fe+l 



With this given we have proved the following result. 



Theorem 4. There is a monotone strategy that achieves a competitive ratio of 

,(fc +!)'=+! 



1 + 2 - 



k^ 



where k = [log m] , to search on m rays in parallel, if no lower hound on the 
distance to the target is known. 
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4.2 Lower Bound 



We consider first the parallel search problem for monotone strategies and prove 
a lower bound on the competitive ratio for these strategies. As time passes the 
robots move continuously and monotonically with some speed along the rays 
until one of them has found the target. This robot now travels at full speed to 
one of the other robots and communicates to him the location of the target, they 
both travel to other robots to communicate the location of the target, and so 
on. When all robots know where the target is they all go to this target point. 

Let vi{T),V 2 {T), . . . ,Vm{T) be the average speeds of the m robots at time 
T, i.e., the coordinate position of the robot on its ray at this time divided by the 
time. It is clear that a search strategy is completely specified by the m average 
speed functions. We have the following lower bound. 



Lemma 3. There is no monotone strategy that achieves a better competitive 
ratio than 



1 + 2 



(fc+1) 



k+l 



where k = [log m] , to search on m rays in parallel. 

We are now in a position to prove the following theorem. 



Theorem 5. There is no strategy whatsoever that achieves a better competitive 
ratio than 



1 + 2 



(fc+1) 

k^ 



k+l 



where k = [log m] , to search on m rays in parallel. 



Proof. Any strategy can be given by the average speed functions v\ (T), . . . , Vm{T) 
of the m robots. (In conjunction with information about whether a robot switches 
ray, at some point in time.) Given these average speed functions, an adversary 
can extract information about the time length that a robot moves monotonically 
along a ray. (This includes also the time that a robot stands still at the origin.) 
Let Ti, for 1 < i < m denote the time that robot i moves monotonically along a 
ray. Each Ti is greater than 0 since either the robot stands still or moves along 
some ray. If T^ = mini<i<j„{T'i}, then we let the adversary place the target on 
some ray, say ray 1, at distance D = vi{Td)To, such that To < for 

k = [logm]. If the strategy uses more than time, then the competitive ratio 
is trivially bounded from below by 



1 + 2 



(fc+ 1)^+^ 
k^ 



for k = [logm]. If, on the other hand, the strategy uses less than time, 
then the strategy is monotone in the interesting time interval and we can apply 
Lemma 0 proving our claim. □ 
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5 Conclusions 



We considered search strategies for parallel search on m concurrent rays. We 
show that a straight forward generalization of the so called doubling strategy, 
from searching on the line to searching on m concurrent rays, yields a competitive 
ratio of 9 if a minimum distance from the origin to the target is known in advance. 
Furthermore, we prove that 9 is a lower bound on the competitive ratio for both 
monotone and symmetric strategies in this case. 

We also prove a lower bound of 



1 + 2 



(fc+ 1)^+1 



on the competitive ratio, if a minimum distance from the origin to the target is 
not known in advance. Finally, we give a search strategy that achieves this ratio 
regardless of whether such a minimum distance is known or not, giving us an 
optimal search strategy in the latter case. 
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Abstract. The paper gives a logical characterisation of the class 
NTIME(n) of problems that can be solved on a nondeterministic Tur- 
ing machine in linear time. It is shown that a set L of strings is in this 
class if and only if there is a formula of the form 3/i- • 3/fc3i?i- • 3Rm'^xip 
that is true exactly for all strings in L. In this formula the fi are unary 
function symbols, the Ri are unary relation symbols and is a quantifier- 
free formula. Furthermore, the quantihcation of functions is restricted to 
non- crossing, decreasing functions and in ip no equations in which differ- 
ent functions occur are allowed. There are a number of variations of this 
statement, e.g., it holds also for fc = 3. From these results we derive an 
Ehrenfeucht game characterisation of NTIME(n). 



1 Introduction 

Since Fagin’s seminal result that NP is the class of problems that can be de- 
scribed by an existential second-order (ESO) formula |0| there have been several 
characterisations of subclasses of NP by sublogics of ESO PHHO] Lynch showed 
that all problems in NTIME(n), i.e., all problems that can be solved in linear 
time on nondeterministic Turing machines, can also be expressed by a monadic 
ESO formula in the presence of a built-in addition relation. Although NTIME(n) 
is a relatively robust class, in order to capture the notion of linear time as used 
in algorithm design, Turing machines seem far too restrictive: apparantly, simple 
operations, such as traversing a tree, cannot be done. Therefore, a number of al- 
ternative models have been proposed to capture this notion. Notably, in a series 
of papers dSEmni), Grandjean introduced and investigated linear-time classes 
DLIN and NLIN, based on determistic and nondeterministic random access ma- 
chines. NLIN contains NTIME(n) as a subclass but it is not known whether this 
inclusion is strict. Grandjean proved that Lynch’s logic even captures (at least) 
all the languages in NLIN, hence indicating that this logic probably does not 
exactly characterize NTIME(n). He also showed that a set L is in NLIN if and 
only if it is the set of models of formulas of the form 3/i- • 3fkixip, where Lp is 
quantifier-free and the second-order quantifiers 3/^ range over unary functions. 

* full paper: 

ftp:/ /ftp. informat ik.uni-mainz.de/pub/publications/misc/Schwen.tick/stacs99_NTimeN_full.ps 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 143-^221 1999- 
(c) Springer- Verlag Berlin Heidelberg 1999 
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In the present paper, we give an exact characterisation of NTIME(n). The 
motivation behind exact logical characterisations of (presumably) weaker and 
weaker complexity classes is the hope that they might enable lower bound proofs 
by methods of Finite Model Theory like Ehrenfeucht games (cf. p|)- Such games 
have been successfully used in non-expressibility results for similar logics (for a 
survey see, e.g. ini)- 

Our characterisation is obtained by restrictiing Grandjean’s logicfl for NLIN. 
The main difference concerns the function quantifiers. In our logic, quantification 
of functions is restricted to non-crossing functions, i.e., functions on {1, . . ,n} 
whose graph, when drawn in the upper half plane with vertices on the line, 
has no crossing arcs. Such functions have, in different guises, been used before 
to describe computations, e.g., they play an important part in the lower bound 
proof of PS|, and in the separation of DTIME(n) from NTIME(n), proved in fB|. 
In the form of matchings, they were used in a logical characterisation of context- 
free languages in m In fact, we make use of the close connection between 
context-free languages and NTIME(n), which is expressed in the theorem of |3|, 
stating that a set is in NTIME(n) iff it is the projective image of the intersection 
of three context-free languages. To be more precise, we show that a set of strings 
can be recognised by a nondeterministic Turing machine in linear time iff it is the 
set of models of a formula 3fi - ■ 3fk^Ri - ■ where is a quantifier-free 

formula (with certain syntactical restrictions), the second-order quantifiers 3Ri 
range over unary relations, and the function quantifiers 3fi range over decreasing 
non-crossing functions only. By restricting the number, k, of function variables, 
we obtain a strict hierarchy of classes from A: = 0tofc = 3: k = 0 characterises 
the regular languages 0, A: = 1 the context-free languages, and for k ^ 3, 
we obtain NTIME(n). Using the lower bound from ca. it can be seen that the 
class of languages defined by formulas with k = 2 function quantifiers lies strictly 
between context-free languages and NTIME(n). 

Our logic is fairly robust, allowing a number of variations. If we want to show 
that some set is contained in NTIME(n), we can do this by using a rather liberal 
syntax. On the other hand, in order to show that some set is not contained in 
NTIME(n), the more constrained our formula class, the better. We present an 
Ehrenfeucht game for NTIME(n), based on our most restrictive characterisation, 
in which the players play only three rounds of rather restricted moves. 

2 Preliminaries 

2.1 Strings and Structnres 

Let S be an alphabet (i.e. a finite nonempty set). By e we denote the empty 
string, E* is the class of all finite strings over E, and U'*' := E* \ {e}. By |i(;| we 
denote the length of a string w € E* . The signature ts associated to an alphabet 
E consists of two constant symbols min and max , unary function symbols s and 

^ Note however, that Grandjean’s encoding of strings as structures is different from 
ours, which is the straightforward one. 
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p, and a unary relation symbol Wa- for each letter a G S. With each string w = 
wi ■ • -Wn G we associate the Ti;-structure w_ := {[n], 1, n, s, p, {Wa)a^s ) , 
where [n] is an abbreviation for {1, . . ,n}, s{i) := i + 1 for i < n, s{n) := n, 
p(i) := i—l for j > 1, p(l) := 1, and, for every a G E, W„ := {i G [n] j Wi = a}. 

2.2 Formulas 

We consider structures with unary functions, unary relations and constants. 
Consequently, terms are built from variables, constants, and function symbols, 
and an atomic formula is either a term equality or a unary relation symbol 
(also called a predicate) applied to a term. We will consistently use g, /, fi for 
function symbols, R,Q,Ri,Qi,Wa for unary relation symbols. Our logics will 
be fragments of existential second-order logic, obtained by both syntactic and 
semantic restrictions. If ^ is a class of formulas, we write for the class of all 
those formulas of the form Va; <p, where p G d>. Analogously we use notations like 
3/^, Thus, as an example, '0 G 3/1/2/3 3i?FO means 

that 0 is of the form 3/i3/23/33i?i- • 3i?mVa;(p, for some m ^ 0, where is a 
first-order formula. Our semantic restrictions concern the scope of the function 
quantifiers 3/^. Let, for every n, Fn be a class of functions on [n], F := F0. 

For a formula p and a string w we write w_ 3/i- • 3/^ p iff there are functions 
fi, --Jk e F\^\ such that {w, /f , • . , /^) \= p- Accordingly, if 0 = 3/r • 3fk p 
we write Mod^ {ijj) for the F-model set of 0, i.e., the set {w / w |='^ 0}, and for 
a class F of formulas, MOD^(F) := {Mod'^(0) / 0 G F}. 

2.3 Non crossing Functions 

Let / : [n] — > [n] be a non-increasing function, i.e., f{j) ^ j, for all j G [n]. 
We call / non-crossing, iff for all j,j'G[n] such that f{j) < j' ^ j, it holds 
that f{j') ^ /(j). Let NNC denote the class of all non-increasing, non-crossing 
functions, and DNC the class of all decreasing non-crossing functions (where 
additionally to f G NNC we require that /(j) < j, for all j > 1). Instead 
of / G NNC (/ G DNC) we shall also say / is nnc (/ is dnc). With regard 
to expressive power, in our context, the difference between NNC and DNC is 
immaterial, as the following lemma shows. 

Lemma 1. Let Fk := 3/r • fk 3i?VxFO. Then MOD^^^{Fk) = MOD^^^{Fk). 

For the inclusion “C” we represent a function fi G NNC by a function /,' G DNC 
together with a set 0 as follows: 

w) »/.«=., >nd 

Lemma ^ allows us to use either class, depending on which suits our purposes 
best. The more restrictive class DNC has some particularly useful properties. 

Lemma 2. A function / : [n] — > [n] is dnc iff /(I) = 1, and for every j > 1 
there is a number q ^ 0 such that f{j) = /'^(j — 1). Figure Q] (a) gives an 
illustration. 
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Fig. 1. An illustration of Lemma 0 



3 The Main Result 

Let be the class of quantifier-free formulas over the signature ts which have 
as free variables at most one FO variable x, at most k unary function symbols 
/i, . . , /fc, and an arbitrary number of unary relation symbols, and in which no 
term equation contains occurrences of more than one of the symbols fi, . . , fk- 
Let <P^ = 

Theorem 1. NTIME(n) = MOD^^^{3j3RWx^^). 

The proof is given in the next subsections. We first show in Subsection Id. 1 1 that 
every context-free language is the DJVC-model set of a formula in 3f3R\/x<Pi. 
Together with the fact that every language in NTIME(n) is the image under 
a projection of the intersection of three context-free languages (c.f., |3|) this 
proves the inclusion from left to right (even with three function variables), see 
Proposition^ For the other direction, in Subsection ('Pronosition Hll. we first 
transform every formula in 3/ into one whose atoms are all of a very 
restricted form, without changing its DiVC-model set. Finally, in Subsection ESI 
(Proposition 0), we construct, for every such formula, a nondeterministic, linear 
time Turing machine which evaluates the formula on its input string. 

3.1 Expressing Derivations 

Concerning context-free grammars, we use the standard notation of El- Let 
us briefly recall the notion of a derivation tree for a context-free grammar G = 
(P, S, P, S). Every cr S A is a derivation tree for a (r.IfaiSA'UP,A—s- 
oi • • • Or is a production in P, and Ti is a derivation tree for Ui Ui G A'+, 
then A(Ti , . . , Tr) is a derivation tree for A u\ ■ ■ • Ur, where by A{T \, . . , Tj.) 
we denote the ordered tree whose root is labelled with A and has r subtrees 
T\,. . ,Tr. 

Proposition 1. If a language L C is context-free, then L = MocP^'^ {fi) 
for a Ts -formula ip G 3/3i?Va;^i. 

Proof. As the constructions of the proof of Lemma 0 do not essentially change 
the structure of the FO formulas, it suffices to find a formula ip such that L = 
Mod^^^{ip). Let L be generated by some grammar G = {V, S, P, S) in quadratic 
double Greibach normal form, i.e., by a grammar whose productions are of the 
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form A —!■ a where a € U SS U SVS U SVVS (|2|, Theorem 3.3). We have 
to find a formula ip such that for all w € if’*" we have w iff S w. 

Let w G L be of length n, and let T be a derivation tree for S' w. As G is 
in quadratic double Greibach normal form, the leftmost and the rightmost child 
of each internal node of T are leaves, labelled with terminal symbols; and every 
position in w corresponds to a leftmost or to a rightmost child of an internal 
node of T. We can thus represent T by an nnc-function and sets for 

each production A ^ a in P as follows: For all j € [n] we define 

{ z if i corresponds to the leftmost, J to the right- 
most child of the same internal node of T, 

J if no such i exists. 

Qa^oU) j corresponds to the rightmost child of an internal node of 

T which is associated to the production A ^ a, i.e., which 
is labelled with A, has exactly |a| children, the fth of which 
is labelled with the zth letter in a (for all 1 ^ ^ |o;|). 

As one can easily see, (w, /^, (^A^a)ep) satisfies the formula 

^tree • — Vx ^disjoint ^ start A A — >o:(:r) t ip A — »a) , where 

(A->a)eP 

^disjoint • — A Gg^(:r)) 

q'jtqeP 

Patart ■= x=max [fx=min A \J Qs^a(x)) 

(S^a)eP 

PA^a := Wa(x) A fx=x 
PA^<tt ~ Wa{fx) A Wr{x) A f X=pX^X 
PA^aBr := Wa{fx) A Wr{x) A fx=pfpxAfpX A \f Q(B^0){px) 

(B^0)£P 

Pa-.<tCBt := Waifx) A Wr{x) A fx=pfpfpxAfpfpx A 
( V Qb^p{px)) a ( Y Qc^jipfpx)). 

(B^0)dP (C^7)ep 

We thus obtain w [=^Arc where zp := 3f {3Qq)q(zp ptree- For the op- 
posite direction let w be a string of length n, let / be an nnc-function on 
[n], and let {QA^a)(A^a)^p be subsets of [n] such that ptree is satisfied by 
iULt ft {QA^a)(A^a)Gp}- We have to show that S w, i.e., that w G L. For 
all j G [n] such that QA^a{j) for some (A ^ a) G P we define a tree T{j) as 
follows: 

If QA^a{j) then T{j) := A{a ) , 
if QA^ar ij) then T{j) := A{a, t) , 
if Qa^< 7 Bt(J) then T(j) := A(a, T(p(j)), t) , 
if QA^aCBr(j) then T(j) := A(a, T{pfp{j)), T{p{j)), r). 

By a straightforward induction on the depth of T(j) one can easily show that 
if QA^a{j)t then T(j) is a derivation tree for A '^fU) Pstart 
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guarantees that /(n) = 1 and that Qs^a{n) for some production S' ^ a in P, 
we conclude that T{n) is a derivation tree for S w, and hence w G L. □ 

Let E and S be alphabets. A mapping h : E E is called a projection. We 
extend h to map strings in the canonical way: h(wi ■ ■ ■ Wn) '■= h(wi) ■ ■ ■ h(wn). 
li L C E* then the set h{L) := {h{w) / w G L} is called the projection of L 
under 

Theorem 2 ([3], Theorem 4.1). A language L is in NTIME(n) if and only 
if L is a projection of the intersection of three context-free languages. 

Hence, from Proposition Q we obtain the following: 

Proposition 2. If a language L C E~^ is in NTIME(n), then L = {fi) 

for a Ts-formula G 3/1/2/3 3R\/x<p3. 

Proof. From Theorem |3 we obtain an alphabet E, context-free languages 
Li, L2, A3 C i 7 +, and a projection h : E ^ E such that L — h{Li (1^2(3 A3). 
W.l.o.g., En E = ^. Proposition Q] provides formulas tpi G 3 / 3 PVa:<?i over the 
signature such that Li = Mod^^^{tjji) for i G { 1 , 2 , 3 }. We define a formula 

ij := (aWs)jg^ (V >1 AV>2AV>3) A (Vx V (Wa(x) A W^(5)(a:) A f\^Ws'{x))) , 

ses a'^S 

which holds for a string w G A+ iff there exists a string w G E~'~ of the same 
length, such that w satisfies ifi-. 'f’ 2 , and ips, and h{w) = w. Hence we obtain 
Mod^^'^{tjj) = /i(LinA2nA3) = L. Furthermore, "0 can easily be transformed 
into a formula in 3/1/2/3 3PVa:<?3. □ 

3.2 Simplifying Formulas 

In this subsection we will prove the following proposition. 

Proposition 3. For euery formula if G 3 /i- • /^ 3 PVxl?fc there is a formula 
if' G 3 /i- • fk 3 PVa;^fc such that Mod^^^{if') = ModP^^ {ip) , and the atoms of 
if' are of the following forms: 

— x=min, x=max, fiX=fiPX (where i G [k] and 0), 

— Q(x), Q(sx), Q(px), Qifix) (where i G [A:] and Q is a unary relation sym- 
bol). 

Proof, (sketch) The proof proceeds in several steps: 

1 . We replace every equational atom tx=t'x by a new relational atom Qp tq(x) 
(with the intended meaning that Q{t,t'}(j) t{j) = t'{j)), and we replace 
tfi = y (for /i G {min, max} and an arbitrary term y) by the new relational 
atom Qtii{y) (with the intended meaning that Qt^-ij) ^ t{pi) = j). 

^ In the literature, projections are sometimes called length-preserving homomorphisms. 
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2. We eliminate all predicates where ^ {/i,/fp} by replacing 

them with adequate formulas. (In this step we make essential use of the 
special properties of dnc-functions). 

3. We take the conjunction of the resulting formula with formulas that define 
the predicates Q{/j,/«p} and Qt^- After this step, all equations of the resulting 
formula are of the desired form. 

4. We replace every relational atom Q{tx) by a new relational atom Qt{x) (with 
the intended meaning that Qt{j) Qi^U))) and take the conjunction of the 
resulting formula with formulas that define the predicates Qt- 

5. A similar construction is performed to replace atoms of the form Q(tmin) 
and Q{tmax). 

More details can be found in the full version of the paper. □ 

3.3 Evaluating Formulas 

We now conclude the proof of Theorem Q by showing how a nondeterministic 
Turing machine can evaluate formulas of the form given in Propositional 

Proposition 4. Let ip be of the form 3/i- ■3fk3Ri - - BRmyx cp, where (p is a 
quantifier-free formula in which all atoms are of one of the following forms: 

— x=min, x=max, fiX=ffpx (where i G [k] and 0), 

— Q{x), Q{sx), Q{px), Q{fix) (where i S [fc] and Q is a unary relation 
symbol). 

Then there is a nondeterministic, linear time Turing machine which accepts 
precisely those strings w for which w 

Proof. The machine, M, will scan its input from left to right, guessing the values 
of all the relations, and evaluating ip accordingly. Since writing down (and read- 
ing) the values fi{j) would take too long (l7(lgn) steps per input position) M 
represents these values indirectly, by the movements of pushdown heads. To this 
end, we equip M with k pushdown tapes, one for each of the function variables 
fi. When scanning the input at position j, pushdown tape i will hold information 
about fi{j), /f (j), . . , 1, in this order. More precisely, on input w, M proceeds in 
n = |?«| ’’metasteps”, where in the jth metastep it looks at the jth input sym- 
bol Wj. It maintains three variables, P“,P, P+, where P~ = ((rj", . . , r“), (t“) 
contains the information about the previous position, j—l. rf is intended to be 
the truth value of Ri, a~ the input symbol, Wj-\, at that position. P and P+ 
contain the same information about the current and the next position, respec- 
tively. The details are given in the full paper. □ 

Remark 1. It should be noted that in the proof of Proposition El M scans its 
input only once, from left to right, and uses as many pushdown tapes as ip has 
function variables. In particular, for k = 1, the formula can be evaluated by a 
pushdown automaton, i.e, the language is context-free. 
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4 Discussion and Further Results 

A logical characterisation of a complexity class can expose what is typical for 
that class. Our characterisation shows that, in some sense, non-crossing func- 
tions capture the essence of linear time on nondeterministic Turing machines. 
NTIME(n) allows some variation in the precise definition of Turing machines. 
This is reflected in the logic: we can vary both our syntactical and semantical 
restrictions quite considerably, without changing the class of model sets. Some 
of these variations lead to interesting insights: we obtain a strict hierarchy of 
classes within NTIME(n) and a characterisation of the class by an Ehrenfeucht 
game, which looks considerably easier to play for the duplicator than the game 
one obtains directly from Theorem [D 



4.1 Variations 

The following proposition subsumes several possible variations of Theorem P 
where by we denote the class of all Boolean combinations of atoms of the 
forms x=min, x=max, fiX=x, fiX=ffpx, Q{x), Q{gx), for g G {s,p, /i, . . , /fe} 
and unary relation symbols Q. 

Proposition 5. For F G {DJVC, JVJVC} we have 

NTIME(n) = MOD^{3j\/xF,) = MOD^^iJjJRyxF,) = MOD^" (JfififsJRyxF's ) . 

From the results of UBI one obtains a different logic for NTIME(rd- There, 
over the relational signature {<, Wa / cr G A}, it was shown thalQ CEL = 
FO), where MATCFl is essentially the class of all injective, 
partial dnc- functions. Together with the theorem of Book and Greibach (Theo- 
rem 0, it follows that a language L is in NTIME(n) iff L = for 

a formula ijj := 3 Mi 3 M 23 M 33 i?i- • 3Rm{ipi A A (^ 3 ), where < and Mi are the 
only binary relation symbols occurring in the FO-formula pi (for i G {1,2, 3}). 



4.2 Separations 

As stated in Remark ^ the number of function symbols corresponds to the 
number of pushdown tapes needed for evaluating a formula (with respect to the 
function class F G {DNC, NNC}). As a consequence, we obtain a strict hierar- 
chy of classes from k=0 to fc=3 by restricting the number k of function vari- 
ables, as illustrated in the following picture. The class MOD^ (3/i /2 3i?Va:<?2) 
is separated from CEL by the language {ww / w G (0, 1}'*'}, and NTIME(n) is 
separated from MOD^ (3/i /2 3i?Vx<?2) by the result of IT^ . which says that 
the language SMT {sparse matrix transposition) cannot be accepted with two 
pushdown tapes in time o(n log n), but with three pushdown tapes it can (even 
deterministically) be accepted in linear time. 

By CFL we denote the class of all context-free languages. 
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MOD^(3f3RVx<P,) = MOD^(3/i/ 2/3 aflVxtPs) = NTIME(n) 

_ I 

MOD^(3/i/ 2 aflVa; •P 2 ) 

_ I 

MOD^(a/3flVa:<Pi) = CFL 

_ I 

MOD’^ {3R'ix$o) = REG 



4.3 Games 

Ehrenfeucht games have been proved a useful tool for showing inexpressibility 
results in Finite Model Theory. For a general introduction to Ehrenfeucht games 
we refer to the textbooks of Immerman Hg and of Ebbinghaus and Flum jS|. 
Our game for NTIME(n) makes use of the idea of Ajtai and Fagin P| that in 
Ehrenfeucht games for existential second-order logic the duplicator can choose 
the second structure after the spoiler has selected relations (or, in our case, 
functions) for the first structure. From each of the possible logical characterisa- 
tions of NTIME(n) one can derive a corresponding Ehrenfeucht game. We are 
going to describe here one variant that looks particularly easy to play for the 
duplicator. In particular, after choosing functions and colourings, both players 
have to select only one position in each string. A closer inspection of the proof 
of Proposition Q shows that in the characterisation of NTIME(n) we can re- 
strict ourselves to special nnc-functions / which, apart from being nnc, have the 
following properties: 

— fifij)) = f{j) for all j G [n], 

— for every j there is at most one j' ^ j with f(j') = j, i.e., / is one-one, if 
we ignore loops /(j) = j, and 

— the width of each arc (/(j), j) is at most 2, where by width we denote the 
number of arcs lying on the surface beneath that arc. To be precise, (/(j), j) 
has width 0 if /(j) = j or j— 1, width 1 if /(j) = /(j— 1) — 1, and width 2 if 

/(j) = 

The game for a set L of strings consists of the following three rounds: 

1. The spoiler chooses a number m ^ 0. Afterwards, the duplicator chooses a 
string w G L. In the following, let n denote the length of w. 

2. The spoiler chooses special nnc-functions /i,/ 2,/3 on [n], and colours w 

with m colours. Afterwards, the duplicator chooses a string w' ^ L, special 
nnc-functions on [n'], and m colours on w'. (Here, n' denotes the 

length of w'.) 

3. The spoiler chooses j' G [n'\. Afterwards, the duplicator chooses j G [n]. 

The duplicator wins the game iff the following conditions are satisfied: 

— j = 1 iff j' = 1, and j = n iff f = n', 

— the colour of position j in w is the same as the colour of position j' in w' . 
The corresponding statements hold for the positions j—1 and j' —1, and for 
f,{j) and /((/) (for i G {1,2,3}). 
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— The arc {fi{j),j) has the same width as the arc and it is a loop 

iff (/i(/).j') is a loop (for i G {1,2,3}). 

Proposition 6. A set L of strings is in NTIME(n) iff the spoiler has a winning 
strategy in the game for L. 

Acknowledgements We would like to thank Malika More and Arnaud Durand 
for stimulating discussions. 
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Abstract. Our goal is to study the complexity of infinite binary recur- 
sive sequences. We introduce several measures of the quantity of infor- 
mation they contain. Some measures are based on size of programs that 
generate the sequence, the others are based on the Kolmogorov complex- 
ity of its finite prefixes. The relations between these complexity measures 
are established. The most surprising among them are obtained using a 
specific two-players game. 



1 Introduction 

The notion of Kolmogorov entropy (=complexity) for finite binary strings was 
introduced in the 60ies independently by Solomonoff, Kolmogorov and 
Chaitin jam]. There are different versions (plain Kolmogorov entropy, prefix 
entropy, etc. see ^ for the details) that differ from each other not more than by 
an additive term logarithmic in the length of the argument. In the sequel we are 
using plain Kolmogorov entropy K(a:|?/) as defined in but similar results can 
be obtained for prefix complexity. 

When an infinite 0-1-sequence is given, we may study the entropy (=com- 
plexity) of its finite prefixes. If prefixes have high complexity, the sequence is 
random (see ^ for details and references); if prefixes have low complexity, the 
sequence is computable. In the sequel, we study the latter type. 

Let K(cc), K(x|j/) denote the plain Kolmogorov entropy (complexity) of a 
binary string x and the conditional Kolmogorov entropy (complexity) of x when 
y (some other binary string) is known. That is, K(a;) is the length of the shortest 
program p that prints x; K.{x\y) is the length of the shortest program that prints 
X given y as input. (For details see |n| or |0j.) 

Let u>i-n denote first n bits (= n-prefix) of the sequence u>. 

Let us recall the following criteria of computability of lo in terms of entropy 
of its finite prefixes. 

* The work was done while visiting LIP, Ecole Normale Superieure of Lyon. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 153-^^21 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



154 



Bruno Durand, Alexander Shen, and Nikolai Vereshagin 



(a) uj is computable if and only if K(wi:„|n) = 0(1). This result is attributed in 
0 to A.R. Meyer (see also |9I5|1. 

(b) oj is computable if and only if K(wi:„) < K(n) + 0(1) P]. 

(c) OJ is computable if and only if K(wi:„) < log 2 n + 0(1) | 2 |- 

These results provide criteria of the computability of infinite sequences. For 
example, (a) can be reformulated as follows: sequence oj is computable if and 
only if M{oj) is finite, where 

M{oj) = maxK(wi:„|n) = maxmin{^(p) | p{n) = 0Ji-,n\- 

n n p 

(l{p) stands for the length of program p; p{n) denotes its output on n). 

Therefore, M{oj) can be considered as a complexity measure for oj\ M{oj) is 
finite iff oj is computable. 

Another straightforward approach is to define entropy (complexity) of a se- 
quence OJ as the length of the shortest program computing oj: 

K{oj) = min{/(p) | Vn p{n) = wi:„}, 

(and by definition K(oj) = oo if w is not computable.) 

The difference between K{oj) and M{oj) can be explained as follows: M{oj) < 
m means that for every n there is a program of size at most m that computes 
wi:ra given n; this program may depend on n. On the other hand, K(oj) < m 
means that there is a one such program that works for all n. Thus, M{oj) < K{oj) 
for all OJ, and one can expect that M{oj) may be significantly less than K{oj). 
(Note that the known proofs of (a) give no bounds of K{oj) in terms of M{oj).) 

Indeed, theorem 0 shows that there is no computable bound for K{oj) in 
terms of M{oj): for any computable function a{m) there exist computable infinite 
sequences oj^, 00 ^, 00 "^ . . . such that M{oj™‘) < m + 0(1) and K{oj'^) > a{m) — 

o{i). 

The situation changes surprisingly when we compare “almost all” versions of 
K{oj) and M{oj) defined in the following way: 

Kooioj) = min{Z(p) | V°°n p(n) = 0Ji,n} 

Mao{oj) = limsupK(o;i:„|n) = min{m | V“n3p {l{p) < m and p{n) = 0Ji,n)}i 

n 

(V°°n stands for “for all but finitely many n”). It is easy to see that Mao{oj) is 
finite only for computable sequences. Indeed, if Moo{oj) is finite, then M{oj) is 
also finite, and the computability of oj is implied by Meyer’s theorem. 

Surprisingly, it turns out that Kao{oj) < 2Mao{oj) -1-0(1) (theorem E3) so the 
difference between K^o and M^o is not so large as between K and M. We stress 
that this result is rather strange because a multiplicative constant 2 appears, 
and has no intuitive meaning taking into account that all the six complexity 
measures (“entropies”) mentioned above are “well calibrated” in the following 
sense: there are 0(2"*) sequences whose entropy does not exceed m. -In the 
general theory of Kolmogorov complexity, additive constants often appear, but 
not multiplicative ones. As theorem 0 shows, this bound is tight. 
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It is interesting also to compare K^o and M^o with K and M , as well as 
with relativized versions of K . For any oracle A one may consider a relativized 
Kolmogorov complexity allowing programs to access the oracle. Then K^{oj) 
is defined in a natural way. By K'{uj) [or iC"(w)] we mean K^{uj) where A = 0' 
[or 0"]. The results of this comparison are shown by a diagram (Fig.^. 

K'{uj) ^ Koo{uj) K{oj) 




K''{oj) ^ Moo(w) M(oj) 

Fig. 1. Relations between different complexity measures for infinite sequences 

Arrows go from the bigger quantity to the smaller one (up to 0(l)-term, as 
usual) . Bold arrows indicate inequalities that are immediate consequences of the 
definitions. Other arrows are provided by Theorem^] {K'{u>) < Kooioj) + C>(1)) 
and Theorem^ {K"{io) < Moo{uj) + 0(1)). 

As we have said, Koo{uj) < 2Moo{u) + 0(1), so K^o and M^o differ only by 
a bounded factor. If we ignore such a difference, we get a simplified diagram 

K”{uj) ^ K\u) ^ Koo(w),Moo(a;) ^ M{u) ^ K{uj) 

where X means that X = 0(Y). 

On the last diagram no arrow could be inverted. Indeed, K" (uj) is finite while 
K'(oj) is infinite for a sequence w that is 0"-computable but not O'-computable. 
Therefore the first arrow cannot be inverted. The second one cannot be inverted 
for similar reasons: K'(lo) is finite while Kao^uj) and Afoo(w) are infinite for 
a sequence that is O'-computable but not computable. Theorem |21 shows that 
Kao{uj) and Moo(w) could be small while M{uj) is large. Finally, Theorem 0 
shows that M{uj) could be small while K{uj) is large. 

These diagrams and the statements we made about them do not tell us 
whether the inequalities AToo(w) < M{uj) + 0(1) and K'{uj) < AIao{^^) + 0(1) 
are true. The first one is not true, as Theorem 0 implies. We don’t know whether 
the second one is true. 

Other open questions: (1) is it possible to reverse the second arrow (AT'(w) ^ 
AToo(/), Afoo(/)) for computable sequences? (2) what can be said about similar 
notions for finite strings? in particular, is limsup„ K(a;|n) equal to K'(a:) +0(1) 
or not m 

2 Theorems and Proofs 

Theorem 1. K'{lo) < Koo{oj) + 0(1). 

Proof. Let p{n) = LVi-,n for almost all n. The following program q (with access 
to O') computes given n: For fc = n, n + 1, . . . find out (using O') whether 

^ It was shown recently by the third author that limsnp^ K(a:|n) = K'{x) + 0(1). 
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(a) p{k) is defined and is a binary string of length fc; (b) p{m) is consistent with 
p{k) for all m > k; consistency means that either [p{m) has length m and has 
prefix p{k)] or [p{m) is undefined]. As soon as k satisfying both (a) and (b) is 
found, print the first n bits of p{k). 

Obviously, q{n) = uji-n for all n and the bit length of q is 0{1) longer than 
that of p. □ 

Theorem 2. For any computable function a{m) there exist infinite sequences 
such that M {to™') > a{m) while Kao < m + 0{1). 

Proof. Let Xm be the lexicographically first string x of length a{m) such that 
K(a;|a(TO)) > a(m). (Such a string exists since the number of programs of length 
less than k is less than 2^.) 

Now let w™ = XmOOOO . . . . By definition, M{u^) > K{xm\o:{m)) > a{m). 
On the other hand, A'oo(w™) < to + 0(1). Indeed, the set {a; | K(a;|Z(a;)) < l{x)} 
is enumerable. Consider the program pm that having input n performs n steps of 
enumeration of this set. Then the program pm finds the first string x'!f^ of length 
TO that was not encountered, and outputs first n bits of the sequence xJ^^OOOO .... 
If n is large enough then x'!f^ = Xm and p outputs It remains to note that 
the length of Pm is logTO + 0(1). □ 

Theorem 3. For any eomputable function a{m) there exist infinite sequences 
. . . such that > a(m) while < to + 0(1). 

Proof. Let c be a constant (to be specified later). The set E = {{x,k) \ K(a;) < 

a{k) + c} is enumerable. Consider the process of its enumeration. Let s(to) be 
the time (step number) when all pairs of type {x, to) with a given to have been 
appeared in E. Now let w"* = .... 

Let us prove that K{u™‘) > a{m) — 0(1). Assume that p(n) = for all n. 
Given p we can find the first 1 in w™ and hence s(to). Thus K(s(to)) < + 

0(1). On the other hand, given s(to) we can find the (lexicographically) first 
string Xm of entropy a{m) or more, therefore, a{m) < K(x„i) < K(s(to)) + 0(1). 
Hence a{m) < A'(w™) + 0(1). 

Let us prove now that M(o;'") < to + 0(1). Let the program q on input n 

output n zeros. Then q{n) = for all n < s{m). 

Consider the program pm that on input n does n steps of enumeration of the 
set E, finds the number s{m,n) of the last step among them when a new pair 
of type (a;, to) with a given to has been appeared, and then outputs the first n 
bits of the sequence 0®^'"’"Hllllll.... If n > s(to), then pm outputs the correct 
prefix of w"*. 

Thus, for any n, either pm or q (given n) outputs It remains to note 

that the length of Pm is logTO + 0(1). □ 

These theorems El and 0 can be reinforced using a technique presented in Pj: 
they are true for any computable infinite family of distinct sequences . . . 

(the family itself should be computable). Anyways these pathological cases are 
rare: the difference between K(x) and K" {x) can be huge but this concerns only 
an exponentially small portion of strings a; of a given size. 
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Theorem 4. 

K"{u) < MooH + 0(1). 

Proof. Let m = Moo(w) + 1. Consider the set T = {x \ K(a;|/(a;)) < m}. By 
definition, all sufficiently long prefixes of uj belong to T. The set T is enumerable. 
For each n there are at most 2™ strings of length n in T. A string x G T is called 
“good” if there is a sequence f such that a: is a prefix of ^ and all prefixes of ^ 
longer than x belong to T (in other words, if x lies on the infinite path in T). It 
is easy to see that Konig’s lemma allows to express the statement “x is good” 
as V3-statement. Therefore, the set T of all good strings is 0"-decidable. 

This set can be represented as an union of non-overlapping infinite paths: 
consider all the strings in order of increasing length; if a string in T is found 
that is not already included in one of the paths, take a path that starts with it 
(if there are many of them, choose the lexicographically first, i.e., turn to the 
left when possible). The number of different paths does not exceed 2"*. This 
decomposition process is 0"-effective, i.e., there is an 0"-algorithm that gives k- 
bit prefix of path number i for given k and i. Appending i (considered as m-bit 
string) to that algorithm, we get a 0"-program that gives k-hit prefixes of i-th 
path for all k (this program needs also m to construct T and T, but m is given 
implicitly as the length of i). Since one of the paths goes along oj, we conlude 
that K"{f) <m + 0(1) = Moo(w) -I- 0(1). □ 

The next two theorems provide the connection between Kao and Mao. 
Theorem 5. Kao(i^) < 2Moo(o;) -1-0(1). 

Theorem 6. There is a sequence w"* of infinite strings such that M(o;™) < 
TO-I-O(l) and Kao(uj"^) > 2m (hence Mao(uJ^), M(cu"') = m-l-O(l), Koo(oj'^) = 
2m + 0(l)). 

Proof. (The original proof of theoremElwas simplified significantly by An. A. Mu- 
chnik.) First, let us define a game that is relevant to both theoremsOandEIand 
may be interesting in its own right. 

Let k, I be integer parameters. The (k, l)-game is played by two players called 
the Man (M) and the Nature (N). On its moves, N builds a binary rooted tree. 
More specifically, during its move N adds a binary string to a finite set T (initally 
empty) . On his moves, M may color certain binary strings using colors from the 
set { 1 , 2 ,... ,/} (several colors may be attached to the same string; attached 
colors cannot be removed later). 

The game stops after a finite number of moves if 

(1) r is not a tree (that is, there are x G T and y ^ T such that y is a prefix of 
x)\ in this case M wins, or 

(2) for some n the number of strings of length n in T (the number of nodes 
having depth n) exceeds k\ in this case M also wins, or 

(3) there are two different strings of the same length colored by the same color; 
in this case N wins. 
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Otherwise the game lasts indefinitely long, and the winner is determined as 
follows. Let T be the ultimate tree (formed by all strings included in T at all 
steps). An infinite 0- 1-sequence is called an infinite branch of T if £ T for 
all n. 

M wins if for any infinite branch (3 there exists a color c such that all but 
finitely many nodes of (3 are colored by c (and, may be, by other colors). Other- 
wise N wins. 

(One may give the following interpretation to this game. The tree built by 
Nature is the tree of all breeds of animals, and nodes at height n are breeds 
existing at time n. The coloring is giving names to breeds. Thus Man is required 
to give stable names to all eternal breeds.) 

We will use also a modified version of this game where the rule (1) is omitted 
and the definition of an infinite branch is changed as follows: sequence uj is an 
infinite branch if all but finitely many prefixes of oj are in T . (Obviously, the 
modified game is more difficult for M than the original one.) 

The following two lemmas play a key role in the proof of theorems 0 and 0 

Lemma 1. For any k, there is a computable winning strategy for M in the 
modified {k,k^)-game (the winning algorithm has k as an input). 



Lemma 2. N has a computable winning strategy in the {k,l)-game if I < A:^/4. 

Before proving these lemmas, let us finish the proof of theorems 0and0 using 
them. 

Theorem 0 requires us to prove that K^{uj) < 2Moo(w) -I- 0{1). 

Fix io. Let T = {x \ K(a;|/(a;)) < Moo(w)}. Then for any n the set T has 
no more than k = strings of length n. According to our assumption, 

wi:n € T for all but finitely many n. Thus to is an infinite branch in T. Consider 
now the following strategy for N in modified (k, fc^)-game: N just enumerates T 
(ignoring M’s replies) . M can defeat this strategy using his computable strategy 
that exists according to lemma 0 

Since both M and N are using computable strategies, the set C = {{x,p) \ 
node X gets color p at some stage} is enumerable. As M wins, there is a color p 
that is attached to uji-n for all sufficiently large n. Each color can be considered 
as binary string of length 2(Moo(w) -I- 1), since there are at most k"^ colors. 

The following algorithm computes oJi-n given n and p. First find the value 
k = Second, enumerate C until a pair {x^p) appears with 

fix) = n, i.e., until some node x having depth n gets color p. Then return x. For 
all sufficiently large n this algorithm will return ui-,n (since the infinite branch 
UJ has color p assigned) . 

The program q to compute uji-,n given n for almost all n consists of the above 
algorithm with the string p appended. Thus, the length of q is 2M(uj) + 0(1), 
and the theorem 0 (modulo lemma 0) is proved. 

Now let us derive theorem 0from lemma El We need to prove that there exist 
infinite sequences uj'^,uj^, . . . such that < m-l-O(l) and AToo(w™) > 2m. 
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For any fixed m consider the following strategy for M. He enumerates all 
triples {p,n,x) such that p{n) = x; if it turns out that l{x) = n and l{p) < 2m, 
he assigns color p to string x. This strategy may be performed by an algorithm 
having m as an input. 

Let k = 2™+^, I = 2^"* — 1. Since I < fc^/4, the lemma |2| guarantees that N 
could defeat this strategy using its own computable strategy. Therefore, there 
exists an algorithm A that given m generates a tree T"* which has an infinite 
branch lu that is not properly colored, i.e., there is no p of length less than 2m 
such that p{n) = cui-.n for almost all n. In other words, iCoo(w) > 2m. 

On the other hand, M{uj) < m + 0(1). Indeed, let n be a natural number. 
Let us describe a program of size m + 0(1) that computes Consider an 
algorithm B that for a given string q of length m + 1 and for any n uses A to 
generate T"* and waits until q nodes (here q is identified with its ordinal number 
among all strings of length m + 1) at height n appear. Then B outputs the node 
that appeared last. Since uj\.,n C T™, for some q the output will be equal to 
iOi,n- The string q appended to B constitutes a program to compute given 
n. This program has size m + 0(1). 

Theorem El is proved (modulo lemma E|) 

Now we have to prove lemmas ^ and 

Recall that lemma E says that for any k, there is a computable winning 
strategy for M in the modified (k, fc^)-game (the winning algorithm has k as an 
input). 

Proof. (Using An. Muchnik’s argument.) Let M use k"^ colors indexed by pairs 
(o, b), where a and b are natural numbers in range 1. .k. Let us explain how the 
color (a, 6) is assigned. (Different colors are assigned independently.) Observing 
the growing set T, M looks for all pairs of strings u and v such that: 

(a) u has number a if we count all the (already appeared) strings in T in the 

lexicographic order; 

(b) V has number b if we count all the (already appeared) strings in T in the 

reverse lexicographic order; 

(c) u is a prefix of v. 

After such a pair of strings is found, any prefix of u gets color (a, 6) unless some 
other string of the same length already has this color (and M is prohibited to 
use (a, b) again on that level). Then M looks for another pair of strings u and v 
with the same properties, etc. 

We need to prove that this strategy guarantees that any infinite branch will 
be colored uniformly starting at some point. Let T be the set of all strings that 
N gives (at all steps). Let uj be an infinite branch, so wi..„ G T for all sufficiently 
large n. For these n let a„ denote the lexicographic number of u>i,,n in the set T„ 
of all strings of length n that are in T, and let bn denote the inverse lexicographic 
number of uji,,n in T„. Let a = limsupa„ and b = limsup6„. We claim that for 
sufficiently large n the string 0Ji,,n will have color (o,6). 

Indeed, consider a pair (u, v) that satisfies the conditions listed above. Let 
us prove first that for sufficiently long sequences only prefixes of uj have chance 
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to get colored with color {a,b). Indeed, for large enough n we have a„ < a, so 
sufficiently long strings u are “on the right of tu” or are prefixes of uj. (“On the 
right of u means that u follows the prefix of uj having the same length, in the 
lexicographic order.) For the same reasons all sufficiently long strings v are on 
the left of UJ or are prefixes of uj. Therefore, the only chance for it to be a prefix 
of V (if both are long enough) is when both u and v are prefixes of uj. Therefore, 
no other long strings (except prefixes of uj) could get color (a, b). 

According to the definition of a and b there are infinitely many n such that 
a„ = o and infinitely many m such that bm = b. Choose a pair of such n and m; 
assume that n < m. The strings u = uj\,,n and v = will be discovered after 
all strings of length n and m appear in the enumeration of T since they will have 
correct ordinal numbers. And all prefixes of u will get color (a, 5) unless some 
other vertex of the same length already has this color. (And this is possibly only 
for short strings, as we have seen) . Since u may be arbitrarily long, all sufficiently 
long prefixes of uj will get color (a, 6). Lemma ^ is proved. 

Lemma 0 says that N has a computable winning strategy in (/c,/)-game of 
I < fc2/4. 

Proof. Let m = k/2. First we introduce some terminology. We consider finite 
trees T with m distinguished leaves at the height equal to height of the tree. 
Those distinguished leaves are called tops of the tree. The m paths from the root 
to m tops are called trunks of the tree. All the nodes that belong to the trunks 
are called trunk nodes; other are called side nodes. 

We call a tree T' an extension of a tree T if (a) T C T'; (b) T' does not 
contain new vertices on the levels that exist in T (i.e., any string is T' — T is 
longer than any string in T); (c) all trunks of T' continue those of T (that is, 
jth trunk of T' continues jth trunk of T for all j < m). 

First N builds any tree Tq of width m that has m trunks. Then N continues 
all the m trunks of Tq (for example, by adding, for any top v, nodes uO, uOO, 
and so on) and waits until M starts to color nodes on the trunks (otherwise he 
looses). More specifically, N waits until there exists hi such that the nodes at 
height hi on all m trunks are colored. We call those nodes special ones. The 
colors of special nodes are be pairwise different, as the special nodes are at the 
same height (otherwise M looses). Let h 2 be the height of trunks when M colors 
the last special node (/i 2 > hi). 

N has just forced M to use m different colors and has constructed a finite tree 
of width m. However, we wish (for the next iteration) that the nodes colored in 
m different colors do not belong to trunks at the expense of increasing the width 
of the tree by 1. This is done as follows. Once N has forced M to color m special 
nodes at the same height hi, it chooses one the trunks and cuts it (this means 
that N will not continue that trunk) . Then N takes the father of the special node 
on that trunk and starts from the father another trunk instead of the cut trunk. 
The nodes lying on the cut trunk from the height hi to /12 become side nodes. 
Thus at least one side node is colored. Call that node a distinguished node. After 
that N still grows m trunks in parallel (continuing m — 1 non-cut trunks and the 
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trunk having a branch with the distinguished node) until M colors m nodes on 
TO trunks at a new height ft-3 > /i2- 

Call those nodes the new special nodes. Now N chooses a trunk whose new 
special node is colored in a color different from the color of the distinguished 
node, cuts it and starts a new trunk from its node at height /13 — 1 . We thus 
obtain the second side node colored in a color different from the color of the 
distiguished node. Call that side node also a distiguished node. Thus we have 
two distinduished side nodes having different colors. 

This process is repeated to times. Each time N cuts a trunk whose special 
node is colored in a color different from the colors of the existing distiguished 
nodes (such a special node exists while the number of distinguished nodes is 
less than to). After to repetitions we have a tree of width to + 1 that has to 
distinguished side nodes colored in to different colors. 

The described strategy will be denoted by ^i. Its starting point may be any 
tree T with to trunks. It either terminates and constructs an extension T' of T 
such that T' — T is colored in to different colors, or wins. The set T' — T has 
width TO + 1 . 

Now let us describe the induction step. Assume A is a subset of a tree T. 
Let colors(A) [sidecolors(A)] denote the set of colors of all nodes [all side nodes] 
in X. 

Assume we have a strategy Si (i < to) for N with the following properties. 
Starting from any tree T with to trunks it constructs a finite extension T' of T 
such that the difference T' — T has width m + i and | sidecolors(T' — T)| > im. 

Our goal is to define a strategy satisfying the same conditions (for 

increased value of i). We define first an auxilliary strategy Si+i that, starting 
from any tree T with to trunks, constructs a finite extension T' of T such that the 
difference T' — T has width m+i, \ colors(T' — T)| > {i + l)m, and | sidecolors(T' — 
T)| > im (or 5^+1 wins). 

The strategy 5^+1 given a tree T works as follows. Apply Si starting from 
T. Wait until Si terminates. Let Ti be the continuation of T constructed by 
Si- Then | sidecolors(Ti — T)| > im. Apply Si starting from T\. Wait until Si 
constructs a continuation T2 of Ti with | sidecolors(T2 — Ti)| > im. Applying Si 
many times, we get Ti, T2, T3, . . . . Wait until there exist j and s such that j < s 
and all the nodes along all the trunks inside Tj —Tj-i at step s are colored and 
each trunk has its own color (if no such j and s exist, the startegy S'i+i never 
terminates and wins). Let T' = Tg. The tree Tg has im different colors on side 
nodes in Tj — T)_i and to new colors on nodes on to trunks. 

Now we are able to define the strategy S'i+i . Starting from a tree T it works as 
follows. Apply Si+i starting from T. Wait until it terminates. Let Ti denote the 
resulting tree. The set colors(Ti — T) has at least (z + 1 )to colors. The problem, 
however, is that some of them may be used for trunk nodes only. In this case 
choose a trunk of T\ that has a node colored in a color c G colors(Ti — T) — 
sidecolors(Ti — T). Let j be the number of that trunk. We add to Ti a new 
branch starting from the jth top of T and declare this branch a new trunk of Ti; 
the old jth trunk is not a trunk anymore. This operation increases the width of 
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Ti—T to TO+t+1. The gain is that the set sidecolors(Ti— T) has got a new color c. 
So I sidecolors(Ti — T)| > im + 1 now. If it happens that the set sidecolors(Ti — T) 
already has at least {i+ l)m colors, we stop. Otherwise, we apply once more the 
strategy Si+i starting from Ti. We get T 2 such that | colors(T2 — Ti)| > {i+l)m. 
As I sidecolors(Ti — T)| < (t + l)m, the set colors(T2 — Ti) has at least one 
color that does not belong to sidecolors(Ti — T). We choose again a color c from 
colors(T2 — Ti) — sidecolors(Ti — T), choose a trunk node in T 2 — T\ colored 
by c, make a new trunk from the top of T\ lying on that trunk and thus get 
sidecolors(T2 — T) > sidecolors(Ti — T) + l > im + 2. Repeating this trick at most 
m times, we obtain an extension T' such that sidecolors(T' — T) > {i + l)m and 
the width oiT' — T is at most m + i. 

The induction step is described. Note that the strategy Sm wins in the 
2m, (to^ — l)-game. □ 
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Abstract. In this paper we consider the complexity of several problems 
involving finite algebraic structures. Given finite universal algebras A 
and B, these problems ask: (1) Do A and B satisfy precisely the same 
identities? (2) Do they satisfy the same quasi-identities? and (3) Do A 
and B have the same set of term operations? 

In addition to the general case in which we allow arbitrary (finite) alge- 
bras, we consider each of these problems under the restrictions that all 
operations are unary, and that A and B have cardinality two. We briefly 
discuss the relationship of these problems to algebraic specification the- 
ory. 



There are several relationships between mathematical structures that might 
be considered “fundamental”. First and foremost is certainly the isomorphism 
relation. Questions about isomorphic structures occur throughout mathematics 
and apply to universal algebras, topological spaces, graphs, partially ordered 
sets, etc. Many other relationships are more specialized. For example, given two 
graphs G and H, one may wish to know whether H is a subgraph of G, or 
perhaps a minor of G. 

Properly formulated, questions about these relationships give rise to com- 
plexity questions. Generally speaking, we must impose some sort of finiteness 
assumption on the structures in question so that notions of computational com- 
plexity make sense. The complexity of various isomorphism problems have re- 
ceived a great deal of attention. The graph isomorphism problem has been in- 
tensively studied, partly because its exact relationship to the classes P and NP 
is still unknown, and partly because it provides a paradigm for other problems 
of unknown complexity status. In this case, both graphs are assumed to have 
finitely many vertices and finitely many edges. With a similar formulation, the 
isomorphism problem for algebras has the same complexity as does graph iso- 
morphism. More generally, Kozen m showed that the isomorphism problem 

* The complete vesion of this paper is available from 
http : //www.math. iastate . edu/ cbergman/papers .html 

C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 163-|1Y^ 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 



164 Clifford Bergman and Giora Slutzki 



for finitely presented algebras has this same complexity. See for further 

discussion and references on the isomorphism problem. 

In this paper we consider the complexity of three relationships that arise 
from considerations in universal algebra. Any algebraic structure satisfies certain 
identities and fails to satisfy others. Roughly speaking, an identity is an equality 
between two expressions built from the operations of the algebra. Examples 
of identities are the associative law (which involves one binary operation) and 
DeMorgan’s law (two binary and one unary operation). Identities are one of the 
primary organizing tools in algebra. 

Given two algebras A and B, we may ask whether they satisfy precisely the 
same set of identities. Notice that this is a far weaker notion than isomorphism. 
For example, any algebra satisfies the same identities as each of its direct powers. 
Nevertheless, if A and B satisfy the same identities, then they will be constrained 
to behave in a similar way. One of our problems, called Var-Equiv, is this: 
Given two finite algebras of the same finite similarity type, determine whether 
they satisfy the same identities. 

This problem has implications for several areas of computer science. Formal 
algebraic specifications are expressions in a mathematical language which de- 
scribe the properties and/or input-output behavior that a software system must 
exhibit, without putting any restrictions on the way in which these properties 
are implemented. This abstraction makes formal specifications extremely useful 
in the process of developing software systems where it serves as a reference point 
for users, implementers, testers and writers of instruction manuals. Formal spec- 
ifications have been applied successfully in deployment of sophisticated software 
systems, see m. especially the references there. 

Mathematically, formal algebraic specifications are firmly grounded on al- 
gebraic concepts, especially ideas, notions and methods from universal algebra 

The relationship between implementation and equational specification cor- 
responds, in algebraic terms, to the relationship between an algebra and a set of 
identities satisfied by the algebra. Thus, two algebras that satisfy the same iden- 
tities correspond to a pair of implementations with precisely the same specifica- 
tion. The computational complexity of these problems, in the universal algebraic 
framework, is thus quite relevant to the body of research in formal specification 
theory, and to the construction of supporting tools such as theorem provers and 
model checkers. 

Generalizing the notion of identity, we arrive at a quasi-identity. We shall 
leave a precise definition for Sect. ^ but crudely speaking, a quasi-identity in- 
volves a conjunction of identities and an implication. An example is the left- 
cancellation law (for, say, a semigroup). In direct analogy with the previous 
problem we can ask for the complexity of the following. Given two finite alge- 
bras of the same finite similarity type, determine whether they satisfy exactly the 
same quasi-identities. This notion too extends to algebraic specification theory, 
since “conditional specifications” take the form of quasi-identities. 

Our third problem involves the term operations of an algebra. Although an 
algebra may be endowed with only finitely many basic operations, we can con- 
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struct many more by composing the basic ones in various combinations. These 
are called the term operations of the algebra. Two algebras (presumably of dif- 
ferent similarity types) are called term-equivalent if they have the same universe 
and exactly the same set of term operations. In universal algebra, term-equivalent 
algebras are considered the same, “for all practical purposes” . The problem we 
call Term-Equiv is that of determining whether two finite algebras are term- 
equivalent. Returning once again to the realm of specification theory, in this 
problem we are asking whether a pair of implementations for two entirely dif- 
ferent specifications have the property that they exhibit the same input-output 
behavior. 

Each of these three problems makes sense for arbitrary finite algebras with 
an arbitrary (but finite) set of basic operations. In addition to this most general 
formulation we consider, for each of the three problems, two more restricted 
settings that, experience tells us, may result in different complexities. The first 
is to require that all basic operations on our algebras be unary. In the second, 
we only consider algebras of cardinality two. 

We would like to thank Joel Berman, Gary Leavens and Ross Willard for 
many helpful discussions on this and related topics. 



1 Preliminaries 

We shall assume that the reader is familiar with the fundamental definitions 
and concepts of universal algebra. Our primary reference for this material is 
m. Other good references are |^, especially for the material on quasivarieties, 
and |3- Although a bit dated, Taylor’s survey in uni is particularly readable. 

We use the notation V(A) (respectively Q(A)) for the variety (quasivariety) 
generated by an algebra A, and Clo(A) for the clone of term operations on A. 
We write A ~ B to indicate that the algebras A and B have the same similarity 
type. Throughout the paper, all algebras are assumed to have finite similarity 
type. 

We assume that the reader is familiar with the most common notions of 
complexity theory. Our notation and definitions for complexity classes comes 
from |H| . In particular, we use the names L and NL for the classes of languages 
decidable in logarithmic and nondeterministic logarithmic space, respectively. 
Most of our problems require a pair of algebras as input. Let us be more specific 
as to the form we assume the input will take. The underlying set of an algebra 
can be assumed to be {0, 1, . . . , n — 1} for some positive integer n. In fact, this 
set can be represented in the input by its cardinality, which requires log n bits 
of storage. (All logarithms will be to the base 2.) A fc-ary operation on this set 
is represented as a table of values, in other words, a fc-dimensional array with 
both the indices and entries coming from {0,1,.. .,n — 1}. Notice that this can 
be represented in the input stream using ■ log n bits. 
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2 Discussion of the Problems 

In this paper we shall consider three equivalence relations on algebraic structures. 
First, given two algebras A and B of the same similarity type, is V(A) = V(B)? 
This is equivalent to asking whether A and B satisfy exactly the same identities. 
Note that this only makes sense if the two algebras have the same similarity type. 
It was shown in [0| that this problem is decidable. We shall denote this problem 
Var-Equiv. Thus 

Var-Equiv = { (A, B) : a ~ B & V(A) = V(B) } . 

We have an analogous problem for quasivarieties: 

Qvar-Equiv = { (A, B) : a - B & Q(A) = Q(B) } . 

The assertion (A, B) G Qvar-Equiv is equivalent to A and B satisfying exactly 
the same quasi-identities. Surprisingly, even though the logical form of a quasi- 
identity is much more complicated than that of an identity, Qvar-Equiv has 
a relatively low computational complexity compared to Var-Equiv. Note that 
Qvar-Equiv C Var-Equiv as sets. 

The third problem we shall consider is term-equivalence. Two algebras A 
and B are term- equivalent if and only if they have the same underlying set and 
Clo(A) = Clo(B). For this problem, we do not require that A and B have the 
same similarity type, but we do require that they have the same universe: 

Term-Equiv = { (A, B) : a = S & Clo(A) = Clo(B) } . 

It was shown in [H that Term-Equiv is complete for EXPTIME. 

There are several restrictions of these problems which are of interest and 
which turn out to have a lower complexity. In particular, we can bound either the 
cardinality of the underlying sets or the ranks of the operations of the algebras. 
For example, it was shown in HH that Term-Equiv is complete for PSPACE 
when restricted to unary algebras, that is, algebras in which every operation has 
rank 1. For each of our three problems, we shall consider, in addition to the 
general case, the subcases obtained by considering only unary algebras and only 
2-element algebras. We shall denote the subcase by appending a superscript ‘1’ 
or subscript ‘2’ to the problem. To be precise, let us define 

{7 = { A : A is a unary algebra } 
r={A:A = {0,l}} 

then = X n (f7 X f/) and X2 = X n (T x T) for X any of Term-Equiv, 
Var-Equiv, or Qvar-Equiv. 

Our results for each of these nine problems can be summarized in Table [D 
In this table, the first row concerns the subcase consisting of 2-element algebras, 
the second of unary algebras and the third, the general case. Each of the nine 
entries gives the smallest complexity class known to contain the problem, and 
a superscript indicates that the result is sharp, i.e., the problem is complete 
for the given complexity class. 
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NP 
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NP* 


EXPTIME* 


2-EXPTIME 



3 The Quasivariety Problems 

We begin with the problems that ask whether two algebras generate the same 
quasivariety. It is sometimes convenient to work with an asymmetric variant of 
this. We write 



Qvar-Mem = { (A, B) : a ~ B & B g Q(A) } . 

Since the ‘Q’ operator has the usual properties of closure, we obviously have 

(A,B) G Qvar-Equiv (A,B), (B,A) g Qvar-Mem . ( 1 ) 

It follows that for an instance of size s, membership in Qvar-Equiv can be 
tested with two calls to an algorithm for Qvar-Mem, both using inputs of 
size s. In a natural way, we also have the restricted problems Qvar-Mem^ and 
Qvar-MeM2 consisting of pairs of unary and two-element algebras, respectively. 

Theorem 1. Qvar-Mem G NP. 

Proof. Let A and B be a pair of similar, finite algebras. We wish to determine 
whether B G Q(A). Here is a nondeterministic algorithm. For each unordered 
pair {a, b} of distinct elements of B, guess a function if{a,b} ■ B ^ A such that 
V'{a,6}(o) 7 ^ V'{a,6}(^)- Tcst whether V'{a,6} is a homomorphism. If it is not, then 
reject. But if every V'{a,6} passes the homomorphism test, then accept. 

Since A is finite, Q(A) = SP(A). It is a simple matter to verify that our 
algorithm accepts the pair (A, B) if and only if B G Q(A). To bound the running 
time. Let s denote the size of the input. A function ip from B to A can be guessed 
in time on the order of \B\ ■ \A\, which is at most s^. The verification that ip is a, 
homomorphism also takes time in O(s^). The total number of functions we need 
to construct is < s^. Thus the total running time lies in 0{s'^). □ 

Corollary 1. The following problems lie in NP: Qvar-Equiv, Qvar-Mem^, 
Qvar-Mem2, Qvar-Equiv^ and Qvar-Equiv2. 



Theorem 2. Qvar-Mem^, Qvar-Mem, SubAlg^, SubAlg and Qvar-Equiv 
are all complete for NP. 
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Let us point out that Theorem 0 does not include Qvar-Equiv^. The exact 
complexity of Qvar-Equiv^ is open. 

Now we turn to the problem QVAR-EQUIV2. Suppose that A and B are two- 
element algebras. We claim that B G Q(A) if and only if B = A. To see this, 
note that every two-element algebra is simple. Since \B\ = |A|, we obtain 

B G Q(A) = SP(A) B G S(A) B ^ A . 

The converse, that B ~ A => B G Q(A) is trivial. 

Theorem 3. QVAR-EQUIV2, QVAR-MEM2 G L. 

Proof. As we argued in the previous paragraph, (B,A) G QVAR-MEM2 if and 
only if B = A. There are only two bijections from B to A, and each of these 
can be tested to see if it is a homomorphism. The testing requires just a cou- 
ple of counters, which has a space bound that is logarithmic in the size of 
the input. Thus QVAR-MEM2 G L. Now apply assertion o to deduce that 
QVAR-EQUIV 2 G L. □ 

4 The Variety Problems 

The problem Var-Equiv asks: if A and B are two algebras of the same similarity 
type, is V(A) = V(B)? As with quasivarieties, it is convenient to introduce an 
auxiliary problem, Var-Mem. 

Var-Mem = { (A, B) : a ~ B & B g V(A) } . 

Unlike the situation for quasivarieties, the relationship between Var-Equiv and 
Var-Mem is clear-cut. We have 

(A,B) G Var-Equiv (A,B), (B, A) G Var-Mem, 

(A, B) G Var-Mem (A, A x B) G Var-Equiv . 

The second equivalence follows from the fact that both A and B are homomor- 
phic images of A x B. 

We begin with the two-element problem. The crucial point is the following 
theorem. 

Theorem 4. Let A and B be two-element algebras of the same similarity type. 
Then V(A) = V(B) if and only i/A ~ B. 



Theorem 5. VAR-EQUIV 2 G L. 

Proof. From Theorem 0 ] testing whether (A,B) G Var-Equiv is equivalent to 
testing A = B. Arguing as we did at the end of Sect. 0 there are only two 
possible isomorphisms to test. This can be done deterministically in logarithmic 
space. □ 
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In order to proceed to the remaining two problems, we need some more 
detailed information on the relationship between clones, terms and varieties. Let 
A = (A, F) be an algebra of cardinality n and let m be a positive integer. It 
is always the case that the set ClOm(A) forms a subalgebra of A^^ \ that we 
denote ClOm(A). Notice that we follow the usual typographic convention and 
print ‘Clo’ in boldface when it is to be used as an algebra. Since varieties are 
closed under the formation of both powers and subalgebras, it follows that both 
and ClOm(A) lie in V(A). 

Every term on A can be viewed as a tree in a natural way. Let us write ht(t) 
for the height of the tree corresponding to the term t. If one thinks about the 
natural way to construct the set ClOm(A), we see that for every m-ary term t, 
there is a term t' such that = (f')^ and ht(t') < . In the special case 

that A is a unary algebra, we can do better. Since every m-&vy term operation 
(for any m) is essentially unary, the bound on the height of t' can be reduced to 
nF. Using these remarks, we have the following Theorem. 

Theorem 6. Let A and B be finite algebras of the same similarity type. Assume 
that the eardinalities of A and B are n and m respeetively. Then the following 
are equivalent. 

(i) B€ V(A). 

(a) For every pair of terms s and t, each of height at most \ if A satisfies 
the identity s Ki t then so does B. 

(Hi) B is a homomorphic image of the algebra ClOm(A). 

If A and B are unary algebras, then the bound ^ in {ii) can be reduced to 



Theorem El suggests an approach that can be used to test the condition 
B ^ V(A): simply guess an identity e an check to see whether A satisfies e while 
B fails to satisfy e. This approach seems to be quite effective — at least for unary 
algebras. For in this case, we have the improved bound n" in part (ii) of the 
theorem. 

Let us fix a set F = {fi, ■ . ■ , fk} of operation symbols, each of rank 1. 
Also, let us add an additional unary operation symbol /o which will always be 
interpreted as the identity operation. This has no effect on the algebras, but 
will save us a subscript in our analysis. A typical term over F is of the form 
fiiht-i ■ ■ • fi^fn i.^) where U, * 2 , ■■■At G {0, 1, . . . , A:}. The height of this term 
is I. Since each term involves only one variable, every identity is of one of two 
possible forms: 

s(a:) ~ t{x) or s(a:) « t{y) . 

Notice that the second of these is quite degenerate since it requires that the 
term operations corresponding to s and t both be constant, and in fact, the 
same constant. Nevertheless, it must be considered in the analysis. 

Now suppose that A and B are algebras of type F and of cardinalities n 
and m respectively. Algorithm H is a nondeterministic algorithm that accepts 
the pair (A, B) if and only if B ^ V(A). 
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1 

1. s ^ t ^ fo ; s ^ t ^ Jo . 

2. for i = 1 to n" do 

3. guess j,£ e {0, 1, ... ,fc} 

4. s < J j o s , s < J j o s , t < J I ^ ^ 1 ^ J I ^ ^ 

5. if {{yx,y G A) (s'^(x) = t^{y)) and {3x,y G B) {s^{x) (y))) or 

((V* G A) (s"^(x) = t^(x)) and (3® G B) (s^(x) ^ t^{x))) 

then accept. 



Algorithm 1. Testing (A, B) ^ Var-Mem^ 



How much space is used by this algorithm? Let p = max(n, m). Each of the 
four unary operations can be represented as a vector of length p. Each such 
vector requires plog(p) < p^ bits. We also need space for the counter i, which 
ranges from 0 to nA . Since log(n”) = nlog(n) < p^, i requires another p^ bits. 
It follows that the total amount of space required is on the order of bits. 

What is the size of the input? The algebra A requires log(n) + kn\og{n) > n 
bits. Similarly B requires at least m bits. The total input size is at least n+m > p 
bits. It follows that our algorithm’s space requirements are bounded above by 
the square of the size of the input. 

Theorem 7. Var-Mem^ and Var-Equiv^ lie in PSPACE. 

Proof. The above algorithm can be used to test whether (A, B) lies in the 
complement of Var-Mem^. Since the algorithm is nondeterministic, we get 
Var-Mem^ G co-NPSPACE. But from Savitch’s Theorem [Hj NPSPACE = 
PSPACE and since every deterministic class is closed under complements, co- 
PSPACE = PSPACE. Thus Var-Mem^ G PSPACE. Now it follows from (0 
that Var-Equiv^ g PSPACE as well. □ 

It is not clear whether this is the best possible bound for these two problems. 
We leave it as an open question. 

Problem 1. Are either Var-Mem^ or Var-Equiv^ PSPACE-complete? 

One might hope to apply the same techniques used above to the unrestricted 
problem, Var-Equiv. Unfortunately, the resources needed to evaluate an arbi- 
trary term in a given algebra jump dramatically as soon as we allow a binary 
operation. Our approach instead is to try to construct the homomorphism guar- 
anteed by Theorem El(m). The best we seem to be able to do is the following 
hyperexponential bound. 



Theorem 8. Var-Equiv, Var-Mem g 2-EXPTIME. 
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5 Term-Equivalence 

Recall that the algebras A and B are term-equivalent ii A = B and Clo(A) = 
Clo(B). From a universal algebraic standpoint, term-equivalent algebras are gen- 
erally interchangeable. When considering the complexity of the problem Term- 
Equiv, it is convenient, once again, to consider a slightly different problem. 
Thus we define the problem Clo-Mem to consist of all pairs (F,g) in which 
F U {g} is a set of operations on some finite set A and g € Clo^(E). The prob- 
lem Clo-Mem^ is similar, but we require all of the operations in F U {g} to be 
unary. For the problem Clo-MeM 2 , the set A has cardinality 2. 

Historically, Clo-Mem was the first problem, of those discussed in this paper, 
to be considered in the literature. Kozen proved in 1977 nn, that Clo-Mem^ is 
complete for PSPACE. In 1982, Friedman proved that Clo-Mem is complete 
for EXP TIME, [S|. However, that manuscript was never published. A proof of 
this result appears in 

Let A = (A,F) and B = (B,G). Then 

(A,B) e Term-Equiv 

I <3> 

A = Bk (V 5 e G)(V/ e F) {{F,g), (G, /) G Clo-Mem) 

Conversely, if F U {g} is a set of operations on A, then 

(F,g) G Clo-Mem <;=^ ((A, F), (A, FU { 5 })) G Term-Equiv . (4) 

Of course, similar relationships hold for the unary and two-element variants of 
these problems. 

Now if follows easily from 0) that Clo-Mem is log-space reducible to Term- 
Equiv. This is true for the general, unary and two-element variants of the prob- 
lems. However, a reduction in the other direction is a bit problematic. Let us first 
consider the general case. Given a pair (A, B) of size S, tells us that we can 
test (A, B) G Term-Equiv by making several calls to an algorithm for Clo- 
Mem. The input to each such call will certainly have size at most S, hence will 
run in time at most for some polynomial p. Furthermore, there will clearly 
be at most S such calls. Hence a bound on the running time for Term-Equiv 
will be S ■ which is still exponential in S. Combining our observations we 
conclude that the (general) problem Term-Equiv is complete for EXPTIME. 

For Clo-Mem^ and Clo-MeM 2 , we need to argue a bit differently, since we 
will be interested in a space-bound. Let C denote one of these two problems, and 
let T denote the corresponding term-equivalence problem. Suppose we have an 
algorithm for C that runs in space f{x) on an input of size x. As is commonplace, 
we assume that / is a monotonically increasing function. In applying (EJ to test 
(A,B) G T, the first call to C will require space bounded above by 0{f{S)). 
But subsequent calls to C can reuse the same space. Hence, the total space 
requirement for T is on the order of log S' -I- /(S). (The logS term accounts for 
some counters.) 
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For the specific case C = Clo-Mem^, the function / is a polynomial, so we 

conclude that Term-Equiv^ is complete for PSPACE. When C = Clo-MeM 2 , 

it turns out that f{S) is on the order of log S. From this we get our final Theorem. 

Theorem 9. Clo-MeM 2 and TERM-EQUIV2 lie in NL. 
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Abstract. We give an algorithm computing the branchwidth of inter- 
val graphs in time 0(n®logn). This method generalizes to permutation 
graphs and, more generaly, to trapezoid graphs. In contrast, we show 
that computing branchwidth is NP-complete for splitgraphs and bipar- 
tite graphs. 



1 Introduction 

The research into the branchwidth of fundamental graph classes commenced 
in 0. The authors of this paper showed that the branchwidth problem can 
be solved in polynomial time for planar graphs. In this paper we present some 
further developments. 

The major reason for studying graph parameters like branchwidth are the 
fast algorithms one can obtain for problems when restricted to graphs for which 
this parameter is not too large. Treewidth and branchwidth are closely related 
connectivity measures introduced by Robertson and Seymour. The treewidth 
parameter has drawn most of the attention until now. The two parameters 
differ by at most a small constant factor (more precisely, branchwidth(G) < 
treewidth(G) -I- 1 < |branchwidth(G)). 

Our motivation for studying branchwidth is twofold. The ‘fast’ algorithms 
one obtains for various NP-complete problems are of mere theoretical interest 
in many cases, because of the huge constants involved. These constants appear 
in two stages of the algorithms. In the first stage one needs to construct a 
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branch- or tree-decomposition of the graph with a small width. In this paper we 
show that the complexity of obtaining an optimal branch- or tree-decomposition 
can differ enormously. In the second stage one solves the seemingly difficult 
(i.e., NP-complete) problem at hand using the decomposition. The complexity 
of the second stage usually depends heavily on the width of the decomposition, 
and although the width of the tree- or branch-decomposition differs only by a 
constant factor, this can make or break the application. Therefore, obtaining 
efficient algorithms for the branchwidth of a graph is of independent interest. 

The complexity of the treewidth problem restricted to special graph classes 
was considered in various papers (see, e.g., 0). In this paper we investigate the 
computational complexity of the branchwidth problem for some of the most fun- 
damental classes of graphs (split graphs, bipartite graphs and interval graphs.) 
It should be noted that though the existence of an efficient algorithm for interval 
graphs is not unexpected, it is somewhat surprising that this algorithm is by no 
means straightforward and its correctness requires a nontrivial proof. We show 
that our efficient algorithm for interval graphs can be generalized to a parame- 
terized class of graphs called d-trapezoid graphs. As far as we know, besides 0 
and P, these are the first results dealing with the computational complexity of 
the branchwidth problem. 

2 Preliminaries 

The notion of branchwidth was introduced in jS| . We think it is more convenient 
to work with a more relaxed (yet equivalent) version of the branch-decomposition 
tree. 

Definition 1. A pair {T,t) is a relaxed branch-decomposition if T is a tree 
with vertices of degree at most three and t is a surjective mapping which maps 
every leaf of T to an edge of E. (Hence every edge of E is represented by at least 
one leaf). The order of an edge e in T is the number of vertices x of G such 
that there are leaves t\ and t 2 in different components of T — e with x incident 
both with rfti) and with r(t 2 )- The width of (T,t) is the maximum order of an 
edge of T . The branchwidth ofG, (3{G), is the minimum width over all relaxed 
branch-decompositions ofG. 

Consider a relaxed branch-decomposition (T, r). For a vertex v we denote 
by Ty the smallest subtree T„ of T that contains all leaves A such that r(A) is 
incident with v. 

3 Splitgraphs and Bipartite Graphs 

A graph G = (V, E) is called a splitgraph if V can be split into an independent 
set / of G and a clique G of G. Such a graph is also denoted as G = (/, G, E). 

Since G is a clique in G, /3(G) > |"||G|]. It is easy to see that /3(G) = 2k 
if |G| = 3k and there is a partition of G into three sets Gi, G 2 and G 3 , each of 
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cardinality k, such that for every vertex i G I there is an index j S {1, 2, 3} with 

N{i) n Cj = 0. 

We will show that it is NP-complete to decide for a given hypergraph (X, S) 
whether there is an admissible coloring, i.e., a 3-coloring of X such that the color 
classes are of equal size and every subset s G S contains at most two colors. The 
NP-completeness of the branchwidth problem restricted to split graphs follows 
by setting C = X and I = S with N{s) = s for every s G S. 

The reduction is from GRAPH 3-COLORABILITY, see problem [GT4] in |^. 
Let G = {V, E) be an instance of GRAPH 3-GOLORABILITY with |P| = n > 0 
and I if I = TO. We choose a vertex vq G V and a set W such that \W\ = 2n and 
V nW = The hypergraph H = {X, S) is defined by A = {V\JW) x {1, 2, 3} 
and 



S= {{(?;, A:)} : G E and {i,j,k} = {1,2,3}}U 

{{(w,i) : w GW and i^ j} : j G {1,2,3}} 

Lemma 1. If G is 3- colorable then H has an admissible coloring. 

Proof. Let / : V {1,2,3} be a proper coloring of G. We define an admissible 
coloring c : A ^ {1, 2, 3} by 

c(v, i) = j if and only if v G V and f{v) + i = j (mod 3) and 
c{w, i) = c(vo,i) for all w G IT . 

□ 



Lemma 2. Eor every admissible coloring c of H we have c(wi,i) = c{w 2 ,i) for 
all wi,W 2 G W and i G {1,2,3}. 

Proof. First we observe that |A| = 9n— 3 for \V\ = n. We assume that there exist 
two elements W\,W 2 G W such that c{wi,i) ^ c{w 2 , i) for some i G {1, 2, 3}. Then 
every vertex (w,j), w GW and j G {1,2,3} is colored either with color c{wi,i) 
or with color c{w 2 ,i), since {(w, 1), {w, 2)} : w G W}, {(w, 2), {w, 3)} : w G W}, 
and {{w, 3), {w, 1)} : w GW} are hyperedges of H. This implies that the union 
of these two color classes contains at least 6n vertices of H contradicting the 
fact that the three color classes have equal cardinality. □ 



Lemma 3. If c is an admissible coloring of H and {u,v} is an edge of G such 
that all three colors appear on {v, 1), {v, 2), and {v, 3), then all three colors appear 
on (u, 1), (u,2), and (u, 3). 

Proof. For simplicity we assume c(v,i) = i for i = 1,2,3 and, on the contrary, 
c{u, 1) = c(u, 2). If c(u, 1) yf 3 then on the hyperedge {(u, c{u, 1)), {v, 3 — c(u, 1)), 
(r),3)} appear all three colors. Otherwise, if c(m, 1) = 3 then on the hyperedge 
{(m, c(u, 3)), (?;, 3 — c(u, 3)), (u, 3)} appear all three colors. □ 
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Lemma 4. If G is connected and H has an admissible coloring then G has a 
proper 3-coloring. 

Proof. It follows from Lemma El that all three colors appear at (r;o,l), (no)2), 
and c(vo,2). Now an inductive argument using Lemma El shows that all three 
colors appear at (u, 1), (u, 2), and (u,3) for every vertex v G V. We define a 
3-coloring f : V ^ {1, 2, 3} of G by f{v) = c{v, 3) for all v GV. For every egde 
{u, v} G E we have f{u) yf f{v) since c{u, 3) G {c{v, 1), c{v, 2)} by LemmaEl □ 

Theorem 1. The branchwidth problem is NP-complete when restricted to split 
graphs. 

Proof. The proof is a direct consequence of Lemmas Q and El □ 

It is easy to see that if iL is a proper subdivision of an arbitrary graph 
G then /3{H) = max(I, /3(G)). It follows that the branchwidth problem is also 
NP-complete when restricted to bipartite graphs. 

4 Interval Graphs 

An ordering X = (Ai, . . . , Xf) of the maximal cliques of G is called a consecutive 
clique arrangement (cca for short) if for every vertex of G, the maximal cliques 
containing this vertex occur consecutively in the ordering. Equivalently, Ai_i H 
Ai_|_i C Ai for all z = 2, ...,£— 1. It is well known that G is an interval graph if 
and only if G has a cca. There is a linear time algorithm that either constructs 
a cca or detects that the input is not an interval graph E| . 

By [a, b] we denote the interval {a, a -I- 1, . . . , 6}. Let Xa^b be shorthand for 
Ai. A fragmentation of A is a partition T of [1,^] into intervals [ai,6i], 
[02,62 ],..., [at, 5t]. 

Definition 2. Let X = (Ai, . . . , A^) be a cca of G. We say that Xa^t is a k- 
fragment if and only if 

(i) !A,,fc| < |fc, 

(ii) |Aa_i n Aa,b| < k and \Xb+i n Aa,b| < k 

(iii) \Xa^b n {Xa-i U Af,_|_i)| < k or |Aa^{,[ -|- |Aa_i n Xb+i\ < 2k. 

A k -fragmentation of X is a partition T of [1,£] into intervals such that for 
every [o, b] G iF the set Xafi is a k-fragment. 

For an interval graph G given with a cca we show in this section that /3(G) < k 
if and only if the cca has a fc-fragmentation. For the proof of the “only if” part 
of this claim we need some lemmas. 

Suppose G has a branch-decomposition (T, r) of width at most k. Let a be 
a node of T. We call the connected components of T — a the branches of T with 
respect to a (shortly the a-branches). For every vertex x G V{G), we denote by 
Tx the subtree of T with leaves a G V{T) such that x G r(a). For every clique 
A C P(G), let T(A) be the subtree of T induced by It follows 

from the Helly property that T{X) yf 0 for every clique A of G. 
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Definition 3. A graph G is saturated, if |y(T(X))| = 1 for every maximal 
clique X of G and every relaxed branch- decomposition (T, r) of width (3{G). 

If G is a saturated interval graph with cca {Xi, . . . , Xf) and branch-decom- 
position (T, r), we denote by T{i) the unique vertex of T{Xi). 

Lemma 5. Let {Xi, . . . , Xg) be a cca of an interval graph G. There exists a 
saturated interval graph G' with cca {X[, . . . , X'f) such that (3{G') = (3{G) and 
Xif^V{G)=X, for i= 1,2,..., i. 

Proof. Let k = P{G). We choose an interval graph G' with (3{G') = k obtained 
from G by adding a maximal number of new vertices which appear in exactly one 
maximal clique of G' each. We will show that G' is saturated. Let (X[, . . . , X'g) 
be the cca of G' such that X[ n V{G) = Xi iov i = 1,2, ... ,^. Since \X[\ < 
for every i, such a graph G' exists. 

We consider a relaxed branch-decomposition of G' of width k. Sup- 

pose for the contrary that TfX'f) has more than one vertex for some particular 
i. Then all the trees T^,x G X[, contain a common edge of T' , say e (and hence 
\Xl\ < k). Let G" be the interval graph obtained from G' by adding a new vertex 
X to Xf 

We define a branch-decomposition (T",r") as follows: Take a copy of 
with central vertex a and leaves a\, a^, and 03 , disjoint with T' . Subdivide the 
edge e of T' by anew extra node and identify this node with 0 : 3 . Let n\ = 
and rz 2 = • Subdivide the edge {a,ai} by rii — 1 new vertices and pend 

leaves aij, j = 1,2, . . . ,n\ — 1, on them (one on each). Similarly, create ri 2 — 1 
leaves 02 , j along the edge {a, 02 }- The resulting tree is T” . Partition X' into two 
almost equal parts Yi and Y 2 of sizes |li | = n\ and IF 2 I = u- 2 . Define r" so that its 
restriction to {ai}U{aij : j = 1 , 2 , . . . , m — 1 } is a bijection onto the set of edges 
{{x, y) : y G Yi}, and similarly the restriction to {a 2 }U{a 2 j ■ j = 1,2, ... , rz 2 — 1} 
is a bijection onto the set of edges {{x,y) : y G Y 2 }. For other leaves u of T", 
t”{u) = t'{u). 

Clearly {T” ,t”) is a branch-decomposition of G" of width k contradicting 
the choice of G'. □ 

From now on, we will assume that G itself is saturated. (If we show that G' 
allows a k fragmentation, the inherited fragmentation of G is a fc-fragmentation 
as well.) Note that assuming /3(G) > 2, T(z) is never a leaf of the decomposition. 

Lemma 6. If G has a relaxed branch- decomposition (T, r) of width k = (3{G) 
such that T{i) = T{h) yf T{j) for some i < j < h, then G has a decomposition 
(T, r) of the same width such that T{i) yf T{h) and such that no two clique- 
representatives are unified in (T,f) unless they were unified in {T,t). 

Proof. Let a and S be the two neighbors of T(i) that are not in the T(z)-branch 
containing T{j). Let T* be the subtree rooted at T(i) containing a and 5. 

Take two copies T* and Tf of T*. We will trim the leaves of Tf and Tf . In 
Tf we leave only those leaves that are mapped by t onto edges {x,y} where 
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both X and y belong to Xij, and in T 2 we leave the leaves mapped onto edges 
{x',y'} where both x' and y' belong to Xj^i. Remove from T the subtrees at a 
and <5 which do not contain T(z) and replace them by T* and T 2 , respectively. 
Denote this decomposition (T, f) . 

Note first that (f, f ) is again a relaxed branch-decomposition for G. Indeed, 
if an edge {x,y} G E{G) has both endpoints x,y G Xij and there was a leaf 

V G T* such that t(v) = (x, y), then there remains a copy h of r; in such that 
f(y) = {x,y). If such a v lies in the T(z)-branch that contains T(j) then v = v. 
Similarly for edges {x,y} with x,y G Xj^t. 

Next we show that the width of (T, r) does not exceed k. The orders of edges 
in the T — T* subtree of T did not change. Let u be the neighbor of T{i) on 
the path to T(j). li x G V{G) is such that contains the edge {a,T{i)}, then 
X G f(v) for some leaf v G T* and x G f{v') for another leaf v' G T — T*. If 

V G T — T* then contains the edge {T(f),M}. If v' G then necessarily 
X G Xj and again contains the edge {T(i),u}, since contains a leaf in 
T* and T(j) G T^. In any case, the order of the edge {a,T{i)} in T does not 
exceed the order of the edge {T{i),u} in T. Similarly for the edge {S,T{i)}. 
An analogous argument shows that the orders of edges within (T^) did not 
increase. 

Denote by u* (it^) the copy of vertex u G T* in T* (in T|, respectively). (In 
particular, T(i)l = a and T{i )2 = S.) Observe that 

( T{m) if T(m) ^ T* 

T{m) = < T{m)\ if T(m) G T* and m < j, and 
[ T{m )2 if T{m) G T* and m > j. 

This is because T{m) (defined as above) belongs to Tx for every x G X^, and 
\T{Xm)\ = 1. Therefore f(i) = a ^ T{h) = S and T{m) = T{m') only if 
T{m) = T{m'). □ 



Corollary 1. For the graph G with a cca X , there exists a fragmentation X and 
a relaxed branch- decomposition (T,t) of width k = /3(G) such that T(i) = T(j) 
if and only ifi,j belong to the same interval ofX. 

Proof. For example, every decomposition with the maximum number of distinct 
T(i)’s defines such a fragmentation. □ 

Assume now that a fragmentation X as in Corollary Eis fixed. We will further 
consider the interval supergraph G^ of G such that {Xa^b ■ [a, b] G X} is the set 
of maximal cliques of G^. This graph has again branchwidth k = /3(G) (in 
a branch-decomposition satisfying Corollary Ql for every x G A^, y G Xj s.t. 
i,j belong to the same interval of the fragmentation - i.e., T{i) = T{j) - and 
xy ^ E{G), pend a leaf Vxy to a new vertex subdividing an edge incident with 
T{i) which is contained in Tx fl Ty, and set ripjxy) = {x,y)). Again, in every 
relaxed branch-decomposition (T, r) of width k, the cliques Xf~ have unique 
representatives T(z), i = 1,2, ... ,t. For the sake of brevity we will now assume 
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that G = from some graph H. In other words, we assume that G is saturated 
and G has a relaxed decomposition of width k in which the representatives of 
the maximal cliques are all distinct. 

Definition 4. Let Xi, Xj, and Xh he maximal eliques of a saturated graph G 
with relaxed branch- decomposition (T, r). We say that the representatives of these 
cliques are in claw-position if there exists a node a of T such that every branch 
ofT with respect to a contains one of the vertices T{i), T{j) and T{h). We say 
that they are in (t, j, /i)-path-position, ifT{i) and T{h) lie in different branches 
ofT with respect to T{j). 

Note that for pairwise distinct T(i), T{j) and T{h), exactly one of the fol- 
lowing statements is true: 

— T{i), T(j), and T{h) are in claw-position, 

— T{i), T{j), and T(h) are in (i, j, /i)-path-position, 

— T{i), T(j), and T{h) are in (j, i, /i)-path-position, 

— T{i), T{j), and T{h) are in (i, /i,_;/)-path-position. 

For i < j < h, cliques with representatives in the last two mentioned positions 
are said to be in wrong-order positions. 

Lemma 7. If G has a relaxed branch-decomposition (T,t) of width k = /3(G) 
such that clique representatives are distinct and T(i — l),r(i) and T(i -\- 1) are 
in {i — l,i+ 1, i) -path-position for some i, then G has a decomposition (T, f) of 
the same width such that T{i — l),T(/),T(i -|- 1) are in claw-position and such 
that no two clique representatives coincide and no other triple (j — 1, j, j -I- 1) in 
wrong-order position is created. 

Proof. We rebuild the decomposition (T, t) similarly as in the proof of LemmaEl 
This time T* is the tree rooted in T{i -\- 1) that does not contain T(z) and T 
is constructed by deleting T* from T, adding two new vertices a, 6 adjacent to 
T{i -\- 1) and adding Tf rooted in a and Tf rooted in 6 (again, only the leaves 
mapped by r onto edges with both endpoints in Xi^i are left in T*, and Tf 
contains only copies of those leaves of T* which are mapped by r onto edges with 
both endpoints in W,^)- Clearly, (T, f ) is a relaxed branch-decomposition of G of 
width k, all clique representatives are still distinct and T{i — 1) = — G Tf, 

T{i) = T{i) and T{i -|- 1) = (5 are in claw-position. 

It remains to show that no other triple {j — l,j, j -I- 1) with clique-repre- 
sentatives in wrong-order position was created. For the contrary, suppose that 
T{j — l),T{j) and T{j-\-l) are in wrong-order position but T{j-l),T{j),T{j-\-l) 
were not. It follows that two of f{j — l),T{j),f{j -\- 1) are in T* and the last 
one is in , or vice versa (two of T(j — l),T{j),T{j -\- 1) are in Tf and the 
third one is in T*). But T(h) is in T* (Tf) only \i h < i (h > i, respectively), 
and thus the indices of the newly created triple with clique representatives in 
wrong-order position cannot be consecutive. □ 
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Corollary 2. Every cca of a saturated interval graph G has a fragmentation 
X and a relaxed branch- decomposition (T, r) of of width (3{G) such that all 
representatives T(i),i = 1,2, ... ,t are distinct and for every i = 2,3, . . . ,t — 1, 
the representatives T(i — l),T{i) and T(i + 1) are either in claw-position or in 
{i — l,i,i -\- 1) -path-position. 

Proof. Obviously, Lemma 0 has a symmetric variant that kills a triple (T(i — 
1), T{i),T{i + 1)) in {i,i-l,i-\- l)-path-position. Therefore any relaxed branch- 
decomposition with minimum number of triples (T(z — 1), T(i), T(i-\-l) in wrong- 
order position has actually no triples (T(i — l),T(i),T{i -|- 1) in wrong-order 
position. □ 

The “only if” part of Theorem 0 below now follows from the following two 
lemmas. 

Lemma 8. There exists a relaxed branch-decomposition of width k for G[Xp U 
Xq U Xr] in which the representatives of cliques Xp, Xq, Xr, p < q < r are in 
claw-position if and only if \Xi\ < |fc (for i = p,q,r) and \Xq n (Xp U Xy.)| < k. 

Proof. Suppose G[Xp U Jfg U Xr] has a decomposition (T,t). The inequalities 
\Xi\ < ^k are obvious. Let u be the vertex of T such that T{p),T{q) and T(r) 
lie in different it-branches, and let Cp,eq, Cr be the edges incident with u on the 
paths towards T{p),T{q),T{r), respectively. Set Vij = {a; G Xp U Xq U Xr : 
6i,ej G Ta;} for i,j = p,q,r. Then Xp H Xq C Vpq (since for x G Xp (1 Xq, 
T{p),T{q) G Tx and 6p,eq lie on the unique path connecting T{p) and T{q) in 
T) and similarly, XrPXq C Vrq. Therefore the order of Cq is at least \Vpq U Kg| 
and \Xq n (Xp U < k follows. 

On the other hand, suppose Xp, Xq, Xr satisfy the conditions. Take a vertex 
u adjacent to Xp, Xq and Xr and add leaves mapping onto the edges of the cliques 
near the vertices Xp,Xq,Xr so that T(i) = Xi for i = p,q,r. Since \Xi\ < 

(for i = p,q,r), this can be done so that the orders of the edges incident with 
the clique representatives are < k. It only remains to show that the orders of 
the edges incident with u are small as well. For every i ^ j G {p, q, r} and 
X G Xif] Xj, the tree contains the path Xi, u, Xj, and so the order of the path 
Xi — u is \Xir\{XjLlXh)\ (where j, h are such that {i,j, h} = {p, q, r}). For i = q, 
\Xq n (Xp U Xr)\ < k hy the assumption. For i = p, XpD (Xq U Xr) G XpD Xq 
(since Xp Xr G Xq as p < q < r) and hence \Xq n (Xp U Xj.)| < \Xp n Xg| < 
\Xq n (Xp U Xr)\ < k. Similarly for i = r. □ 

Lemma 9. There exists a relaxed branch- decomposition of width k for G[Xp U 
Xq U Xr] in which the representatives of cliques Xp, Xq, Xr, p < q < r are in 
(p,q,r)-path-position if and only if]Xi] < |fc (fori =p,q,r), ]XiHXj] < k (for 
i ^ j> hj = P, q,r) and ]Xq] + |(Xp n < 2k. 

Proof. Suppose a decomposition (T,t) exists. Call ei, 62 and 63 the outgoing 
edges of T(q) such that ei is on the path towards T(p) and 63 is on the path 
towards T(r). 
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Let a be the number of vertices x of Xq such that ei, 63 € T^, (3 the number of 
vertices such that ei, 62 € and 7 the number of vertices such that 62, 63 G T^- 
Then we can find a suitable assignment exactly when we can find numbers a, j3 
and 7 satisfying: 

1 . cr > I Xp n I J Q; + /? > I Xp n Xq\, 0 + 7 ^ 

2. a + /3 + 7 = \Xq\ 

3. a + (j<k, a + "f<k and /3 + 7 < A: 

It is easy to see that this system of inequalities has a solution exactly when the 
given restrictions are satisfied. □ 



Theorem 2. Let X = (Xi, . . . , X^) be an arbitrary eea of an interval graph G. 
For k >2, /3(G) < k if and only if a k- fragmentation of X exists. 

Proof. The “only if” part follows from Lemma El and Lemma El We proceed with 
the converse. Let T = {[ai,bi] : i G be a ^-fragmentation of X such that 

Qi — 1 = bi-i for every i = 2,3, . . . ,t. We will show that this implies /3(G) < k. 
For simplicity let Xq = = 0. 

Following the constructions in Lemmas El and El we choose branch-decompo- 
sitions (Ti,Ti) of G[Xai^bi] such that contains edges Ci and fi with 

Ci e Pi E{Ti,^) and /i G P E{T,,,) . 

xGXa^-inXa,^ xeXh.+inXi,. 

If the claw condition \Xa^b H {Xa-i U X{,_|_i)| < k is fulfilled then Ci = fi, and 
if the path condition \Xajj \ + \Xa-i H Xb+i\ < 2k is fulfilled then Ci and fi are 
adjacent. If for Xa^^bi the path condition holds, then we subdivide Ci and fi 
by additional vertices Ci and di, respectively. If for Xa^^bt the claw condition 
holds, then we subdivide Cj = fi by an additional vertex, which is adjacent to 
an additional leaf. In this case Cj = di is this leaf. Now we obtain T by adding 
edges {di-\,Ci} for i G [2, t], and we define r(A) = Ti{X) if the leaf A of T is also 
a leaf of T^. 

Obviously (T, r) is a relaxed branch decomposition of G the width of {T, r) 
is at most k. □ 

Below we present a procedure to check whether the branchwidth of G is at 
most k, i.e., we check whether [1,£] has a ^-fragmentation. We use a boolean 
array A[a, b] to indicate whether [a, b] has a fc-fragmentation. We show that this 
procedure can be implemented such that it runs in O(n^) time. We use a boolean 
vertex versus clique matrix with the consecutive ones property for rows. For each 
vertex x let f(x) be the index of the first clique containing x and l(x) be the 
index of the last clique containing x. Clearly the functions f(x) and l(x) can 
be computed in 0{n^) time. Notice that now Xa.b can be computed in 0{n) 
time for every pair a, b, since x ^ Xa^b iff l{x) < a or f{x) > b. Since there are 
0{n^) of these unions, all these can be computed in O(n^) time. We obtain the 
following theorem. 
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Theorem 3. There exists an 0(n^ log n) algorithm to compute the branchwidth 
of an interval graph. 

The result can be extended to d-trapezoid graphs. We restrain from giving 
the details. 

Input: A graph G with a cca (Xi, . . . , Xf) and an integer k. 

Output: A statement whether the branchwidth of G is at most k. 

begin 

Xo ^ 0; X,+i ^ 0; 

for d <— 0 to f do 

for 6 ^ d + 1 to f do 
begin 

a ^ b — d; 

if |Xa-l f)Xa\ > k ov \Xb n Xb+l\ > k 
then A[a, b] <— FALSE 
else 
begin 

if |Xa,6| < ffc and 

(|Xa,6 n (Xa -1 U X6+l)| < k OT 
|Xa,6| + |Xa-l n Xb+l\ < 2k) 
then A[a, b] <— true 
else 
begin 

A [a, b] ^ FALSE; 
for c ^ a to 6 — 1 do 

A[a, b] <— A[a, b] or {A[a, c] and A[c +!,&]) 

end 

end 

end 

if A[l,f] then output “(3{G) < k” else output “(3{G) > k” 

end 



5 Concluding Remarks 

One of the basic questions we could not settle thus far is the complexity of 
branchwidth for cocomparability graphs. Also for the subclass of cobipartite 
graphs, the complexity is an open problem. 
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Abstract. We present a new technique called balanced randomized tree 
splitting. It is useful in constructing unknown trees recursively. By ap- 
plying it we obtain two new results on efficient construction of evolution- 
ary trees: a new upper time-bound on the problem of constructing an 
evolutionary tree from experiments, and a relatively fast approximation 
algorithm for the maximum agreement subtree problem for binary trees 
for which the maximum number of leaves in an optimal solution is large. 

We also present new lower bounds for the problem of constructing an 
evolutionary tree from experiments and for the problem of constructing 
a tree from an ultrametric distance matrix. 

1 Introduction 

Several of the known efficient algorithms for trees rely on their excellent separator 
properties. It is well known that each tree contains a vertex whose removal splits 
it into components of balanced size. Unfortunately, finding such a vertex usually 
requires the knowledge of the tree. In this paper, we consider a more general 
situation when the tree is unknown and we can obtain some partial information 
on its topology at some cost. More precisely, the partial information is in the 
form of the topological subtree induced by a subset of the leaves and the cost 
corresponds to the time taken by the construction of the subtree in a given 
model. We introduce an efficient randomized technique of balanced splitting of 
an unknown tree termed balanced randomized tree splitting. It can be used to 
construct an unknown tree recursively. Our technique seems to be especially 
useful in the efficient construction of evolutionary trees. 

The problem of constructing evolutionary trees is central in computational 
biology and has been studied in several papers ^ ^1 

cni, An evolutionary tree is a tree where the leaves represent species and internal 
nodes represent common ancestors, see Fig. P and Q for two examples. There are 
many different approaches to the problem of constructing an evolutionary tree 
depending, among other things, on what kind of data that is available. 

A well known variant of the problem of constructing an evolutionary tree is 
based on experiments. An experiment determines how three species are related 
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Fig. 1. An example of experiment results and the corresponding evolutionary 
tree. 



in the evolutionary tree, i.e., returns the topological subtree (without weights 
on the edges) for the three species. Fig. [H shows an example of a tree and the 
outcome of experiments. Note that there are two different types of trees one can 
get from an experiment. 

The problem of constructing evolutionary trees using experiments have been 
studied by Kannan, Lawler, and Warnow |B|. In 0, they present algorithms 
for binary evolutionary trees. The fastest of them runs in time O(nlogn) and 
performs at most 4nlog2U experiments. Experiments are expensive and hence it 
is important to minimize their number. The algorithm that performs the smallest 
number of experiments presented in |5j runs in time O(n^) and performs at most 
nlog 2 n experiments. For trees not restricted to be binary Kannan et al. present 
an 0(n^)-time algorithm, which performs 0{dn log n) experiments, where d is the 
maximum degree in the tree, and for the unrestricted degree case they present 
an 0(n^)-time algorithm, that performs 0{v?) experiments jHj. 

By using our new technique of balanced randomized tree splitting, we show 
that an evolutionary tree for n species can be determined, using experiments, in 
expected time 0{ndlognloglogn), where d is the maximum degree in the tree. 
This is a dramatic improvement over the quadratic time-bound of Kannan et al. 
causing only a slight, practically negliable, increase in the number of experiments. 

In [S] it is also shown that the lower bound on the number of experiments 
required to construct the tree in worst case is In this paper we show an 

imax{(deg(u))^ -I- ^^{nlogn)} lower bound on the 

number of experiments, where u is an internal vertex of maximum degree and 
IV (T) is the set of internal vertices in the tree. In particular, if the tree contains 
Q{n/d) vertices of degree 17(d), l7(n(logn -I- d)) experiments are required. We 
derive analogous lower bounds for the number of entries in the so called ultra- 
metric distance matrix (see Sect. E2J that every algorithm constructing the tree 
from it has to access. 

Another popular variant of the problem of constructing an evolutionary tree 
is called the maximum agreement subtree problem (MAST for short), or max- 
imum homeomorphic subtree problem (MHT for short). Given k rooted leaf 
labeled trees Ti, ..., T^, each with n leaves uniquely labeled by elements from an 
n-element set A, MAST asks for a rooted tree T with the maximum number of 
leaves uniquely labeled by some elements from A, such that for i = 1, ..., fc, T is 
homeomorphic to the subtree of Ti induced by the leaves of T. See Fig. Elfor an 
example. 
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MAST has been studied in many papers, see for example pi ITllini ITT71 mUTT?! . 
Keselman and Amir El showed that MAST is NP-complete, even for three 
trees, but on the other hand there are polynomial time algorithms for the two- 
tree case nm and the case where some of the input trees has maximum degree 
bounded by a constant QHH. Polynomial time algorithms are also known for 
the undirected two-tree version of MAST, where the input trees are unrooted 
Emnuig. The best known algorithm, due to Farach, Przytycka, and Thorup, 
for MAST for k n-leaf trees, some of which has maximum degree bounded by d, 
runs in time 0{kn^ + n'^) 0. 

In practice it is often the case that the input trees are both binary and very 
similar, and only a small fraction of the leaves has to be removed to find an 
agreement subtree. For this reason, we have considered the problem of an ef- 
ficient approximation of MAST in the case when the number of leaves in the 
MAST tree is a large fraction of n. For large n, such an approximation can be 
useful as test of whether or not the optimal solution is sufficiently large, and 
so enough interesting, to run the more costly exact algorithm to produce it. 
In this paper, we present an algorithm which for k input n-leaf binary trees, 
admitting an agreement subtree on at least f3n leaves, where (3 G (0.8, 1], con- 
structs an agreement subtree whose expected number of leaves is at least 0.4/3n 
in time 0{kn^^^ n). This approximation algorithm is our second example 

of successful application of the technique of balanced randomized tree splitting. 

Sect. El presents some general results on balanced randomized tree splitting. 
In Sect. 0 the upper bound on constructing a tree from experiments is shown and 
the lower bounds are presented. Sect. 0 presents the approximation algorithm 
for MAST. 

2 Balanced Randomized Splitting of an Unknown Tree 

To find a balanced splitting of the unknown tree T, we randomly pick a sample 
of its leaves and build the topological subtree T' induced by the sample. The 
removal of the vertices of T' from T splits T into components of balanced number 
of leaves with high probability. 

Given a tree T and a subset S of leaves of T, the topological subtree T' of T 
induced by S, denoted by T || S', is the minimum size tree that has S as the set 





a be f c 



Fig. 2. Three binary input trees with six species yield a maximum agreement 
subtree with five species. 
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of its leaves and is homeomorphic to the subtree of T composed of paths in T 
joining each pair of leaves in S. 

For an edge {u, w) of T, let Tu^w denote the subtree of T rooted at u and 
composed of all nodes reachable from w via u. 

For a pair of vertices wi,W 2 of T, let T{wi,W 2 ) denote the forest composed 
of all distinct subtrees where w is an internal vertex on the path joining w\ 
with W 2 in T and u is a vertex of T outside this path. 

A component oiT — T' is either (1) a subtree where w G V(T') and u 
does not belong to any path in T joining w with an adjacent vertex in T' or (2) 
a forest T{wi,W 2 ) where (wi, W 2 ) is an edge of T' . 

Theorem 1. Let T he a tree on n leaves. Let 2 < uq < n and k > 2. For 
any sample S of at least 2kf^logn leaves chosen uniformly at random among 
the n leaves of T , and the topological subtree T' of T induced hy S , each of 
the components of T — T' contains less than no leaves with prohahility at least 
1 

Proof. Note that T has at least two edges. For a pair of vertices w\, W 2 of T, 
suppose that T(wi,W 2 ) has at least no leaves of T. The probability that none 
of the leaves is chosen in S is not greater than (1 — — ) "0 , i.e., it is not 

greater than Since T has at most n vertices, there are less than forests 
T{wi,W 2 ). Consequently, the probability that there is such a forest T{w\,W 2 ) 
with at least no leaves of T none of which is chosen to S is less than .^ 2 k -2 which 
is less than ^ since k >2. Each of the components oiT — T' is included in a 
forest of the form T(wi,W 2 ), therefore we conclude that each of them contains 
less than uq leaves with probability larger than 1 — n~^ . □ 

Note that Theorem Q] implies the existence of a single vertex in T' whose re- 
moval from T partitions the leaves of T into balanced size components with high 
probability. 

Corollary 1. Let T he a tree on n leaves. Let 2 < no < n/2 and k >2. For any 
sample S of at least 2k^logn leaves chosen uniformly at random among the 
n leaves of T, the topological subtree T' of T induced hy S eontains a vertex of 
T whose removal disconnects T into subtrees none of which contains more than 
n/2 -|- no leaves with probability at least 1 — n~^ . Given T' and the size of the 
components ofT — T' , such a vertex can be found in time O((n/no) logn). 

Proof. Transform T' to an auxiliary tree T* , by breaking each edge e = (wi, W 2 ) 
into two edges {w\,We)^ (we, W 2 ), where We is a new vertex uniquely associated 
with e, and adding for each component C of T — T' a unique leaf Iq. If C is of 
the form Tu,w then Ic is adjacent to the vertex w in T' . Otherwise, if C is of the 
form T(wi, u> 2 ) then Ic is adjacent to We where e = (wi, W 2 ). For each leaf Ic of 
T*, set its weight to the number of leaves of T in C. For each of the remaining 
vertices w of T*, set the weight w to zero. 

To prove the thesis, it is sufficient to find a vertex of T' whose removal 
disconnects T* into subtrees none of which contains vertices of total weight 
greater than n/2 + no. 
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By Theorem n, each vertex of T* has weight not exceeding ng, which does 
not exceed n/2 by our assumptions, with probability at least 1 — n~^. 

Root T* at a vertex of T' . If for each child of the root the total weight of 
descendant vertices is not greater than n/2 then we are done. Otherwise, walk 
down the tree always in direction of the child for which the total weight of 
descendant vertices is larger than n/2. If there is no such child, stop. Note that 
since each leaf of T* has weight not larger than n/2, the above procedure stops 
at an internal vertex v oiT* . Also, by the definition of v, its removal disconnects 
T* into subtrees each containing vertices of total weight not greater than n/2. 
If n is a vertex of T', we are done. Otherwise, we choose the parent of f in 
the rooted T*. Since v in this case has only two children, one of which is a leaf 
of weight less than ng, it follows that w satisfies the thesis in this case. 

Given T' and the size of the components of T — T', the above procedure can 
be easily implemented in time linear in the size of Tb □ 

Corollary Q is mostly interesting in the situation when the number of leaves 
in the components of T induced by the removal of the vertices in T' can be 
determined faster than it takes to build the whole T. 

3 Improved Bounds on Bnilding Trees from Experiments 

3.1 An Upper Bound 

We later use the result from Sect. Qto derive a new upper time-bound on the 
construction of an evolutionary tree from experiments, depending on the maxi- 
mum degree of the constructed tree. The following procedure will yield our upper 
bound. 

Algorithm BuildTree(L) 

Input: A set L of species for which experiments can be made. 

Output: An evolutionary tree for L. 

1. Pick a random sample S of species from L of size 8 [log |L|]; 

2. Build an evolutionary tree T' for S by using the quadratic-time algorithm 
from jS|; 

3. Determine the components Ci, ..., C/ oiT — T' where T is the evolutionary 
tree to construct; 

4. Augment each of the components oiT — T' of the form T{x,y) by a single 
species labeling a leaf of T' lying below the lowest common ancestor of x 
and y (keeping unchanged the remaining components); 

5. For each augmented component Ci do 

if Ci contains at most log n species then construct the evolutionary tree for 
Ci by using the quadratic-time algorithm 
else BuildTree(Ci) 

6. Hang the evolutionary trees recursively computed for the augmented com- 
ponents Ci at T' as follows: 

if Ci is of the form then link the root of the evolutionary tree for Ci to 

y, 
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if Ci has been originally of the form T{x,y) then identify the root of the 
evolutionary tree for the augmented Ci with the lowest common ancestor of 
X, y, and identify its leaf labeled by the additional species with the other 
vertex in {x, y}. 

The next fact is useful for analyzing the time complexity of BuildTree(L). 

Fact 1 (see |5|) Let v be an internal node of the evolutionary tree for a set U 
of speeies. Let W CU, and let Wi,...,Wg be the splitting of W into non-empty 
eomponents indueed by v. For any u G U — W , we ean determine whether u 
belongs to any of the components of U induced by v that is a superset of one of 
the components Wi, ...,Wg, and if so, also the index of the component, in time 
0{q) by performing [ |] experiments. 



Theorem 2. An evolutionary tree for n species can be determined in expected 
0(nd log n log log n) time, where d is the maximum degree in the tree. 

Proof. Let L be the set of n species. We use BuildTree(L) to prove the theorem. 
The correctness of this procedure follows from the correctness of combining the 
evolutionary trees recursively computed for the augmented components in step 

6 . 

As for the expected running time of BuildTree(L), it is easily seen to be 
dominated by steps 2, 3 and the “if” part of step 5. 

To estimate the total work (including recursive calls) required in step 2 and 
the “if” part of step 5 note that each internal node of T can appear in at most 
one of the evolutionary trees T' built for the samples. Let ni,...,Ug be the sizes 
of the samples drawn by the algorithm. We have X)i=i — 8|"logn] 

for i = 1 , ..., q. Hence, the total work for constructing the evolutionary trees T' 
for the samples (step 2) does not exceed 

oi max (y^ I = O ( , ^ log^ n ) = O(nlogn) 

y^.n,<n,r^,<8^1ognl ^ J \logn J 

Analogously, the total time taken by the construction of the evolutionary trees 
for the components of size not greater than logn (the “if” part of step 5) is 
0((n/ log n) log^ n) = O(nlogn). 

In order to derive our upper bound on the expected time required by step 
3, we consider the following method of determining the components oi T — T' 
proceed as follows. 

First, we find a (separator) vertex v of T' whose removal disconnects T' into 
subtrees T/, i = 1 , ..., q, none of which has more than | of the vertices of T' . Now, 
we can determine the leaf sets of the subtrees of the form Tu^v, where ( m , v) is an 
edge of T, by performing at most \d/2\n experiments in time 0{dn) by Fact[D 
In this way, in particular we can determine the components oiT — T' of the form 
Tu^v Let v\ through vi be the neighbors of v in T' , and Li through Li be the 
leaf sets assigned to the branches (v,ui) through (v,ui) by the aforementioned 
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experiments. Note that for i = Li is just the leaf set of a subtree 

where u is the second vertex on the path from v to Ui in T. Let T/ be the 
subtree of T' corresponding to the branch (v,Ui), i.e., the maximal subtree of 
T' including Ui and excluding v. For i = we analogously find a vertex 

separator Vi of T' and perform at most \d/2 \ \Li\ experiments in order to split 
Li into the subsets of leaf sets of the subtrees of the form Tu^vi- Importantly, 
note that \i u = v then the subset of Li assigned to the branch (ui, v) yields the 
component T{v,Vi). By proceeding in this way recursively, we can determine all 
the components oiT — T' . 

A recursive separator partition of T' in the form of a tree of vertex separators 
of T' can be found in time linear in the size of T' , i.e., in time O(logn). Hence, 
the total time taken by finding such separator partitions including the recursive 
calls of the BuildTree procedure is easily seen to be 0(n). 

In order to estimate the expected total work taken by determining the com- 
ponents of the tree T restricted to the current component with respect to the 
evolutionary tree T' for the sample for the current component let us make the 
following observation. 

Since the recursive separator partition of T' has depth logarithmic in the size 
of T', each species labeling a leaf of T takes part in 0(log log n) leaf sets that are 
partitioned by the experiments in order to determine the components oiT — T' . 

For a species s, let ni,...,nh be the sizes of the components that s belongs 
to during the performance of the algorithm, where rii = n, rii > logn for i = 
1, ..., h — 1, and rih < logn. Let h(n) = logn — log logn. 

By TheoremQ]it follows that the probability that for i = 1, min( h(n), h) — 
1, n,+i < m/2 is at least - i) > ~ ^ 

Consequently, Prob(/i < h{n)) > e~^ holds. Hence, by Markov inequality the 
expected value of h is at most eh(n). 

Thus, the expected number of leaf sets s belongs to during our algorithm is 
O(lognloglogn). For each leaf set species s belongs to, s has to participate in 
0{nd) experiments (determining the components of the tree T restricted to the 
current component with respect to the splitting trees) . The expected total work 
and the expected total number of experiments are therefore 0{nd log n log log n) . 

□ 



3.2 Lower Bounds 

We shall show a lower bound on the number of experiments required by any 
algorithm constructing an evolutionary tree from them. We do this by firstly 
showing a lower bound on the number of entries in the matrix that has to be 
accessed by any algorithm constructing a tree from the so called ultrametric 
distance matrix. 

A distance matrix D for n species is a n x n matrix where all elements Dij 
are non-negative real numbers. The value Dij represents the distance between 
the species i and j. For example Dij could represent the distance between DNA 
sequences for species i and j. If it is possible to construct a tree, with edge 
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weights, such that the distance between every pair of leaves equals the corre- 
sponding value in the matrix, then the matrix is said to be additive. If the tree 
can be rooted in such a way that the distance from the root to all leaves are 
equal, then the matrix is called ultrametric. 

Culberson and Rudnicki gave an 0(n^ )-time algorithm for the problem of 
constructing a tree from an additive distance matrix Q, and showed it to be 
optimal. When the degree in the tree is bounded by d, their algorithm runs in 
time O(dnlog^n). 

Our lower bound on constructing a tree from an ultrametric distance matrix 
is based on the following simple lemma. 

Lemma 2. Let T he a tree realizing an ultrametric distance matrix M . Let v 
be an internal vertex of T, and let C\,...Ck be the connected components of 
T resulting from removing v. For i,j G {l,...,fc}, i j, any tree-realizability 
algorithm that constructs T on the basis of M without any a priori knowledge 
on M has to access at least one entry of M corresponding to a leaf in Ci and a 
leaf in Cj . 

Proof. Suppose otherwise, i.e., that a tree-realizability algorithm doesn’t access 
any entry of M corresponding to a leaf in Ci and a leaf in Cj. Let m be the 
minimum distance between a leaf in Ci and a leaf in Cj . Transform M to another 
tree-realizable matrix M' by decreasing the distance between each leaf in Ci and 
each leaf in Cj by to — e so in the tree T' realizing M' the components Ci and Cj 
are forced to form the same component with respect to the vertex corresponding 
to V. The algorithm would output the tree T instead of the correct one, i.e., 
T'. □ 



Theorem 3. Let T he a tree realizing an ultrametric distance n x n matrix M , 
and let u be an internal vertex of T of maximum degree. Any tree-realizability 
algorithm that constructs T on the basis of M without any a priori knowledge 
on M has to access max{(deg(u))^ -I- ~ 1)^, C{nlogn)} 

entries in M . 

Proof. The f2{n log n) bound directly follows from the corresponding information- 
theoretic lower bound on the number of experiments determining the topology 
of evolutionary subtrees on three leaves necessary to find out the topology of 
the whole evolutionary tree due to Kannan, Lawler and Warnow [Bj. The lower 
bound of Kannan et al. is valid even in case of binary evolutionary trees. Since 
the topology of a tree on three leaves can be determined by accessing three 
entries of M respectively indexed by the leaves, we conclude that J7(nlogn) 
accesses to the entries of M are required in order to determine the topology of 
T. 

To prove the first lower bound, root T at u. For each non-leaf vertex v of 
the rooted T, at least (deg(f) — 1)^ entries of M corresponding to pairs of leaves 
o, b where a and b are in different subtrees rooted at the children of v have 
to be accessed by Lemma El (for v = u, the corresponding number is at least 
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(deg(M))^). Now, it is sufficient to note that such a pair a, b can only be counted 
at the lowest common ancestor of a and b at the rooted T. □ 

Corollary 2. For a tree T realizing an ultrametric distance matrix M and hav- 
ing its leaves pending from n/{d — 1) internal vertices of degree d each, any 
tree-realizability algorithm that constructs T on the basis of M without any a 
priori knowledge on M has to access f2(n{logn -\- d)) entries of M. 

Since ultrametric distance matrices are a special case of distance matrices the 
above lower bounds hold also for distance matrices in general. These lower 
bounds can be also easily translated into corresponding ones for algorithms con- 
structing evolutionary trees from experiments on three species. 

As pointed out in an experiment on three species could be implemented 
using an ultrametric distance matrix, by looking at its three entries. Hence, 
if we could construct the evolutionary tree by performing less than, say, g{n) 
experiments, then we could construct it by looking at less than 3 • g{n) entries in 
the matrix. Combining this Kannan’s et al. observation with our lower bounds 
for ultrametric distance matrices, we obtain the following theorem. 

Theorem 4. Let T be the evolutionary tree for a set of n species, and let u be 
an internal vertex of T of maximum degree. Any algorithm that constructs T 
on the basis of experiments without any a priori knowledge on T has to perform 

at least | max{(deg(zi))^ -|- ~ experiments. 

In particular, if T contains fi{n/d) vertices of degree 12{d), fl{ni\ogn -\- d)) 
experiments are required. 

4 An Approximation Algorithm for MAST 

In practical applications of the maximum agreement subtree problem, one has 
typically a large number of binary (or, at least bounded degree) input trees and 
looks for a large agreement subtree covering the majority of the input species. 
Although there are known polynomial-time algorithms for MAST when some of 
the input trees have maximum degree bounded by a constant BUD, they are 
not practical as they require 12{kn^) time for k input trees, each on n leaves. For 
this reason, it seems worthy to develop faster approximation algorithms for the 
variants of MAST occuring in practise. They could be used to test whether the 
input instance has large enough agreement subtree worthy to run a costly exact 
algorithm on it or not. In this section, we present an efficient method for verifying 
whether or not the binary input trees have a large maximum agreement subtree 
based on our technique of balanced randomized tree splitting. Because of space 
considerations and for the sake of presentation clarity, we omit an extension of 
our method to include input trees of constantly bounded degree. Our method 
for binary trees is specified as follows. 

Algorithm MAST(Ti, ...,Tk) 

Input: binary trees A, ..., T^, each on n leaves uniquely labeled with 1 through 
n 

Output: an agreement subtree U of Ti, ...,Tk 
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1. Randomly pick a subset L of log^^® n] leaf labels (species) in {1, 2, n}. 

2. For 1 = 1, k, set Hi to Ti || L. 

3. Find the maximum agreement subtree J for i?i, by using the algorithm 
from |3|. 

4. For each edge {x,y) of J and i = determine the set Li[x,y\ of leaf 

labels in the forest Ti{x,y). Set L[x,y] to HiLi Li[x,y]- 

5. For each edge (x,y) of J, if L[x,y] ^ 0 and y is a descendant of x in J 

then construct the maximum agreement subtree J[x, y] for the trees of |j 
{L[x, y] U {oi, a|L[x,y]|+i })5 * = ••o where is the tree obtained from 

Ti by deleting all descendants of y and instead hanging at y a binary tree 
(the same for * = 1, k) with |L[x, y]| + 1 leaves uniquely labeled with the 
elements ai through a\L[x,y]\+i different from the leaf labels of T. 

6. Expand J to the tree U as follows: 

for each edge {x,y) of J, where y is a descendent of x, if L[x,y] ^ 0 then 
identify x with the root of J[x, y\ and y with an arbitrary leaf of J[x, y\ 
with a label of the form a/. 

7. Set U to the subtree of U induced by the leaves labeled with leaf labels of 
T and output it. 

Note that the maximum agreement subtree J[x, y] always contains a leaf with a 
label of the form a/ since the binary tree with |L[x, y] | + 1 leaves uniquely labeled 
with the elements ai through a\L[x,y]\+i is an agreement subtree for the same 
trees including more than half of the possible leaf labels. Hence, the following 
lemma easily follows from the construction of U . 

Lemma 3. The tree U output by MAST{Ti, ...,Tk) is an agreement subtree for 
the input trees Ti, ...,Tfe. 

The analysis of time complexity of MAST(Ti, ..., Tj.) is a bit more complicated. 

Lemma 4. For the binary input trees Ti, ...,Tfc, each on n leaves, 

MASTiTi , ..., Tfe) runs in 0{kn^/^ log®^^ n) time with probability at least l — 

Proof. Steps 1, 2 are easily seen to take log^^^ n)-time or 0(A:n)-time, re- 

spectively. By using the algorithm for MAST from |3| , Step 3 can be implemented 
in time 0(fc(n^/® log^^® n)^). In Step 4, the sets Li[x,y] can be determined by 
standard tree searches in time 0{kn). Note that by Theorem ^ each of them is 
of size log^^^ n) with probability 1 — n~^ . Consequently, each of the sets 

L[x,y] is of the size 0(n^/® log^^® n) with probability 1 — n~^ . The sets L[x,y] 
can be computed in Step 4 by sorting lexicographically the sets Li[x,y\ in total 
time 0{kn). By |2|, the highly probable 0(n^/® log^^^ n) bound on the size of 
L[x,y], and the n upper bound on the sum of the sizes of L[x,y], the agreement 
subtrees J[x,y] can be computed in time 0((n^/^/ log^^^ n)(/c(n^/^ log^^® n)^) 
with probability at least 1 — n~^ in Step 5. The final expansion and pruning 
steps take 0{n) time. □ 
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The analysis of approximation properties of MAST(Ti, T^) is more involved. 
To begin with, we need the following probabilistic lemma. 

Lemma 5. Let T be a tree on n leaves and let (3 S (0.8, 1]. Next, let T' be the 
topological subtree of T induced by a sample of at least (2/3— 1) log^^^ n] 
leaves of T chosen from /3|"n^/® log^^® n] leaves which are chosen uniformly at 
random from the n leaves of T. The probability that there is a set of at most 
(1 — /3) log^^® n] edges (x,y) such that the forests T{x,y) totally contain at 
least (1 — 0.4/3)n leaves of T is ^ 

Proof. Each of the edges (a;, y) of T' is determined by at most three leaves of 
the sample, two to determine the lower endpoint {x,y) as their lowest common 
ancestor, and one additional to determine the higher endpoint. On the other 
hand, in the best case, (1 — /3) log^^® n] -I- 1 leaves in the sample can al- 
ready determine all the (1 — /3) log^^^ n] edges (x,y). For this reason, we 

can bound the aforementioned probability from above, for y = log^^^ n, by 

/ n \ /rn-(l-0.4/3)n-a:-(l-/3)!/l\ /(' n \ 
^p-d)v<x<Sp-d)y \\x+{l-l3)y]) \ \/3y-x-{l-/3)y] )i\[/3y])' 

By straightforward calculations using the Stirling approximation formula, we 
obtain the lemma thesis. □ 



Theorem 5. let (3 G (0.8, 1]. If the number of leaves in a maximum agreement 
subtree of the trees T\, ...,Tk is at least f3n then the expected number of leaves in 
the agreement subtree U produced by MAST{Ti, ...,Tk) is at least 0.4/3n. 

Proof. Let S' be a maximum agreement subtree of T\,...,Tk. Suppose first that 
at least /3|"n^/® log^^® n] elements of L belong to S. It follows that there is an 
agreement subtree for nfs of size not less than /3|"n^/® log^^^ n] induced by Lf\S. 
Hence, a maximum agreement subtree for HiS is of size at least /3|"n^/® log^^® n] . 
It might contain at most log^^^ n~\ — /3\n^^^ log^^^ n] leaf labels outside S. 
Hence, it contains at least (2/3 — 1) log^^^ n] leaf labels in S n L. 

Let S' be the agreement subtree for Hfs induced by the restriction of the leaf 
labels in the maximum agreement subtree J for HiS to Sf]L. Clearly, S' is also 
an agreement subtree for Tfs. It can also be obtained from S by the restriction 
of the leaf labels in S' to S' n L. It follows that S' is a topological subtree of both 
J and S. 

Each maximal subtree of J with leaf labels in L — S can have its root on at 
most one path in J that one-to-one corresponds to a single edge of S' . It follows 
that the number of edges in S' that do not appear in J is bounded from above by 
the number of leaf labels of J outside S, i.e., it is at most (1 — /3) log^^^ n~\ 

If an edge (x,y) of S' is also an edge of J then MAST(Ti, ..., T^) in Step 5 will 
find a super-tree of a maximum agreement subtree for the leaf labels of S' {x,y). 
We conclude by Lemma 0 that under the preliminary assumption that at least 
/3|"n^/^ log^^® n] elements of L belong to S, the agreement subtree U produced 
by MAST(Ti, ..., Tk) contains at least n — (1 — 0.4/3)n leaf labels with very high 
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probability. Now it is sufficient to note that the preliminary supposition holds 
with the probability at least i in order to get the lemma thesis. □ 

5 Final Remarks 



The technique of balanced randomized tree splitting presented in this paper 
should be useful in finding efficient algorithms for several other problems involv- 
ing construction of unknown trees (both within computational biology as well 
as outside it). 

Our approximation algorithm for MAST presumably yields much better ap- 
proximation than that stated in Theorem 0 We conjecture that the expected 
number of leaves in the agreement subtree produced by it is at least (the mini- 
mum number of edges in S' that appear in J times the expected size of a forest 
S'{x,y), which is ) |/9n. 
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Abstract. We use the notion of potential maximal clique to characterize 
the maximal cliques appearing in minimal triangulations of a graph. We 
show that if these objects can be listed in polynomial time for a class of 
graphs, the treewidth and the minimum fill-in are polynomially tractable 
for these graphs. Finally we show how to compute in polynomial time 
the potential maximal cliques of weakly triangulated graphs. 



1 Introduction 

The notion of treewidth was introduced by Robertson and Seymour in HZ). It 
plays a major role in graph algorithm design. Indeed, it has been shown that 
many classical NP-hard problems become polynomial and even linear when re- 
stricted to graphs with small treewidth. These algorithms use a tree decompo- 
sition or a triangulation of the input graph, which is a chordal supergraph, i.e. 
all the cycles with at least four vertices of the supergraph have a chord. Com- 
puting the treewidth consists in finding a triangulation of minimum cliquesize. 
A related probem is the minimum fill-in problem, which consists in finding a 
triangulation of a graph such that the number of added edges is minimum. This 
parameter is used in sparse matrix factorization. 

When computing the treewidth or the minimum fill-in, we are looking for 
triangulations of a graph. In both cases we can restrict to triangulations mini- 
mal by inclusion, that we call minimal triangulations. Also both problems are 
NP-complete. Nevertheless, these parameters can be computed in polynomial 
time for several classes of graphs such as chordal bipartite graphs HZIE], circle 
and circular-arc graphs [Bl ESI EE], AT-free graphs with polynomial number of 
separators PI and HHD-free graphs |2j. Most of these algorithms use the fact 
that these classes of graphs have a polynomial number of minimal separators. It 
was conjectured in nacm that the treewidth and the minimum fill-in should be 
tractable in polynomial time for all the graphs having a polynomial number of 
minimal separators. The conjecture is still open. 

A potential maximal clique of a graph is a vertex set which induces a maximal 
clique in some minimal triangulation of the graph. We show here that if one can 
list in polynomial time all the potential maximal cliques of some class of graphs, 
then the treewidth and the minimum fill-in of those graphs can be computed in 
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polynomial time. This notion is related to the work of [Q, from which we can 
easily deduce that the potential maximal cliques of the previously cited classes 
of graphs can be listed in polynomial time. 

The class of weakly triangulated graphs, introduced in H , is a class of graphs 
with polynomial number of separators, probably the only one for which the 
treewith and minimum fill-in problems were still open. We give an algorithm 
computing the potential maximal cliques of these graphs. Consequently, the 
treewidth and the minimum fill-in of weakly triangulated graphs are computable 
in polynomial time. 

2 Chordal Graphs and Minimal Separators 

Throughout this paper we consider connected, simple, finite, undirected graphs. 

A graph H is chordal (or triangulated) if every cycle of length at least four has 
a chord. A triangulation of a graph G = (V,E) is a chordal graph H = {V,E') 
such that E C Eh H is a, minimal triangulation if for any intermediate set E" 
with E C E" C E', the graph (V, E") is not triangulated. 

A subset 5 C V is an a,b-separator for two nonadjacent vertices a,b G V 
if the removal of S from the graph separates a and b in different connected 
components. 5 is a minimal a,b-separator if no proper subset of S separates a 
and b. We say that S' is a minimal separator of G if there are two vertices a and b 
such that S is a minimal a, b separator. Notice that a minimal separator can be 
strictly included in another. We denote by Ac the set of all minimal separators 
of G. 

Let G be a graph and S a minimal separator of G. We note Cc{S) the set of 
connected components of G — S. A component G G Cc{S) is full if every vertex 
of S is adjacent to some vertex of G. We denote by Cq{S) the set of all full 
components of G — S. For the following lemma, we refer to |B|. 

Lemma 1. A set S of vertices of G is a minimal a,b-separator if and only if a 
and b are in different full components of S. 

If G G C(S), we say that (S, G) = S U G is a block of S. A block (S, G) is 
called full if G is a full component of S. 

Definition 1. Two separators S and T cross, denoted by Sj(T, if there are some 
distinct components G and D of G — T such that S intersects both of them. If S 
and T do not cross, they are called parallel, denoted by S||T. 

It is easy to prove that these relations are symmetric. Remark that for any 
couple of parallel separators S and T, T is contained in some block {S, C) of S. 

A clique of G is a complete subgraph of G. In P) Dirac showed that all the 
minimal separators of a chordal graph are cliques. Using the fact that a separator 
cannot separate two adjacent vertices we deduce the following lemma. 

Lemma 2. Let G be a graph, S a minimal separator and fl a clique of G. Then 
12 is included in some block of S. In particular, the minimal separators of a 
chordal graph are pairwise parallel. 
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Definition 2. Let G be a graph. The treewidth of G, denoted by tw{G), is the 
minimum, over all triangulations H of G, of lu{H) — 1, where to{H) is the the 
maximum cliquesize of H. 



Definition 3. The minimum fill-in of a graph G, denoted by mfi{G), is the 
smallest value of \E{H) — E(G)\, where the minimum is taken over all triangu- 
lations H of G. 

In other words, computing the treewidth of G means finding a triangulation 
with smallest cliquesize, while computing the minimum fill-in consists in finding 
a triangulation with smallest number of edges. In both cases we can restrict our 
work to minimal triangulations. 

Let S G Aq be a minimal separator. We denote by Gs the graph obtained 
from G by completing S, i.e. by adding an edge between every pair of non- 
adjacent vertices of S'. If S C Aq is a set of separators of G, Gr is the graph 
obtained by completing all the separators of E. The results of d, concluded in 
d, establish a strong relation between the minimal triangulations of a graph 
and its minimal separators. 

Theorem 1. Let T G Aq be a maximal set of pairwise parallel separators of G. 
Then El = Gr is a minimal triangulation of G and Ah = E. 

Let H be a minimal triangulation of a graph G. Then Ah is a maximal set 
of pairwise parallel separators of G and H = Gah ■ 

In other terms, every minimal triangulation of a graph G is obtained by 
considering a maximal set E of pairwise parallel separators of G and completing 
the separators of E. The minimal separators of the triangulation are exactly the 
elements of E. 

It is important to know that the elements of E, who become the separators 
of H, have strictly the same behavior in H as in G. Indeed, the connected 
components oi H — S are exactly the same in G — S', for every S G E. Moreover, 
the full components are the same in the two graphs, that is CJj{S) = Cq{S). 

3 Potential Maximal Cliques and Maximal Sets of 
Neighbor Separators 

The previous theorem gives a characterization of the minimal triangulations 
of a graph by means of minimal separators, but it needs a global look over 
the set of minimal separators. Therefore, it gives no algorithmic information 
about how we should construct a minimal triangulation in order to optimize 
its cliquesize or the fill-in. In (Q we introduced the notion of “maximal sets of 
neighbor separators” , and we showed how these sets are related to the maximal 
cliques of any triangulation of a graph that are called here “potential maximal 
cliques” . We will give a new tool to recognize the potential maximal cliques of 
a graph, which will also help to compute all the potential maximal cliques of 
weakly triangulated graphs. 
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Definition 4. A vertex set f2 of a graph G is called a potential maximal clique 
if there is a minimal triangulation H of G such that H is a maximal clique of 



H. 



If is a vertex set of G, we denote by A{K) the minimal separators of G 
included in K. 

Definition 5. A family S of minimal separators of a graph G is called maximal 
set of neighbor separators if there is a potential maximal clique G of G such that 
S = A(Q). We also say that S borders f2 in G. 



Definition 6. Let G be a graph and S C Aq a set of pairwise parallel separators 
such that for any S gS, there is a block (S', C(S')) containing all the separators 
of S. Suppose that S, ordered by inclusion, has no greatest element. We define 
the piece between the elements of S by 

P{S)= f]{S,G{S)) 

ses 

Notice that for any S G S the block of S containing all the separators of S 
is unique : if T G S is not included in S, there is a unique connected component 
of G — S containing T — S. 

The two following theorems, proved in allow us to recognize a potential 
maximal clique of a graph : 

Theorem 2. Let f2 be a vertex set of G and suppose that A{f2) has a maximum 
element S, i.e. every T in A{L2) is included in S. Then G is a potential maximal 
clique if and only if f2 is some block (S,G) and Gs[l7] is a clique. 



Theorem 3. Let Q be a vertex set of G and suppose that A(Q) ordered by 
inclusion has no greatest element. Then H is a potential maximal clique if and 
only if Q = F(A(f2)) and G/i(j7)[l7] is a clique. 

Notice that if we consider a minimal triangulation H of G such that 17 is 
a maximal clique of H, then the separators of G bordering 17 are also minimal 
separators of H, as shown in the following lemma : 

Lemma 3. Let H be a minimal triangulation of a graph G and T be a minimal 
separator of G such that H\T] is a clique. Then T is also a minimal separator 
ofH. 

We are going to give a strong characterization of potential maximal cliques, 
which does not use minimal separators. We need some easy observations on 
theorems 0 and 0 

Proposition 1. Let Q be a potential maximal clique of G and let S G Z\(l7). 
Then S is strictly contained in fi and L2 — S is in a full component of S. 
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Proposition 2. Let Q be a potential maximal clique of a graph G and let a be 
any vertex of V — Q. There is a minimal separator S <Z L2 that separates a and 

n-s. 

Let now if be a set of vertices of a graph G. We denote by Ci(if), . . . , Cp{K) 
the connected components of G — if . We denote by Si{K) the vertices of if 
adjacent to at least one vertex of Gi{K). When no confusion is possible we will 
simply speak of Gi and Si. If Si{K) = if we say that Gi{K) is a full component 
of if. 

Lemma 4. Let Q be a potential maximal clique of a graph G and let Z\(l7) be 
the maximal set of neighbor separators bordering f2. Then the elements of A{f2) 
are exactly the sets Si{f2). 

Proof. We prove that for any i, 1 < i < p, Si is a, minimal a, 6-separator for 
some a £ Gi and b € G — Si. Proposition El tells us that there is some minimal 
separator S € S that separates a from 17 — S'; recall that 17 — S is not empty. 
Since every vertex in Si has a neighbor in Ci, if S does not contain a vertex 
X G Si, S cannot separate x G f2 — S from a so we get Si C S. By proposition 
in 17 — S is in a full component of S, and therefore of Si. Let 6 be a vertex of 
Q — S. Then a and b are in different full components of Si, so Si is a minimal 
a, 6-separator by lemmaO We have to prove now that for any minimal separator 
sen, there is some i,l < i < p such that S = Si. We have that 17 — S 0 
and 17 — S is in some full component of S. Let C be another full component of 
S. Then G is a connected component of G — 17, let us say G^. It follows that 
S C Si. Let X G Si — S, since Gj is a full component of Si, we must have x G C 
contradicting C = Ci. So we get S = Si. We conclude that the separators of G 
included in 17 are exactly the sets Si. □ 

We also give a “sufficient condition” to characterize the potential maximal 
cliques, which is somehow the dual of lemma 0 

Theorem 4. Let K C V be a set of vertices. We denote by S the set of all 
Si{K). K is a potential maximal clique if and only if : 

1. K has no full components. 

2. Gs[K] is a clique. 

Moreover, S is the maximal set of neighbor separators bordering if. 

Proof. We prove the “only if” part. Suppose that if is a potential maximal 
clique of G. By lemma 0, the maximal set of neighbor separators bordering if 
is S. By theorems El and 0 if is a clique in the graph G 5 . It remains to show 
that if has no full components. Let Ci be any connected component of G — if . 
Then Si are the neighbors of Ci in if . Since if is a potential maximal clique and 
Si is a separator contained in if, we have that Si is strictly contained in if, by 
proposition 0 Therefore, Ci is not a full component of if. 

We prove now the “if” part. Let us show at first that 5^,1 < i < p, is a 
minimal separator. Si is clearly a separator and Ci is a full component of Si. Let 
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a: be a vertex oiK— Si. We show that x belongs to a full component of Si different 
from Ci- We denote by Cx the connected component oiG — Si containing x. For 
any y G Si, y must have a neighbor in Cx- This is true if x and y are adjacent 
in G. If X and y are not adjacent, by the second condition of the theorem, x and 
y belong to a same Sj. Gj being a full component of Sj, there is a path in G 
connecting x to y entirely contained in Gj except from x and y, we deduce that 
Cj C Gx- It follows that y has a neighbor in Cx since it has a neighbor in Gj . Si 
is a minimal separator of G according to lemma ^ 

Now, given two distinct separators Si and Sj, we have to show that they 
are parallel. We prove that /f — 5'^ is in a connected component of G — 5'^. Let 
x,y G K — Si- II X and y are adjacent they are clearly in the same component of 
G — Si- Otherwise, since Gs[K] is a clique, they are in a same Sk, so they are 
connected via Ck- So Sj intersects only the component of G—Si containing K—Si 
and then S’iHS'j. Therefore S consists in a set of pairwise parallel separators. 

We have to show that any separator of G included in K is an element of S. 
Consider a minimal triangulation H oi G such that all the elements of S are 
separators of H . We know that if is a clique in H. Now let U C K be any 
minimal separator of G. Notice that U must be strictly included in K, otherwise 
K would have two full components in G contradicting our choice of if. Clearly 
G is a clique in H, so by lemma 0 it is a minimal separator of H. Since K is, & 
clique in H, it must be included in some full block of U. Let {U, G) be another 
full block of U in H, and consequently in G. We have that G is a connected 
component of G — G and G separates G and if — G. We deduce that G is also 
a connected component of G — if , let us say Ci. By definition of Si, we have 
U C Si. Suppose there exists a vertex x G Si — U , since x has a neighbor in G, 
the connected component G of G — G would contain x contradicting G = Gi . So 
we have U = Si and U G S. 

We want to prove that S satisfies the conditions of theorems 0 or El Remark 
that for any y G V — K, y is in some connected component Ci oi G — K and the 
separator Si G S separates y from K — Si. Suppose now that S has an element 
S, maximum by inclusion. Let (S, G) be the block of S containing if. By the 
previous remark, for any y G V — K , S separates y and K — S, so y ^ {S, G). It 
follows that {S, C) = K, so S satisfies all the conditions of theorem 0 Now if S 
does not have an element maximum by inclusion, clearly if is contained in the 
piece between the separators of S in G. By the previous remark, Pg{S) does not 
contain any y G G — if, so if = Pg{S) and therefore we are under the conditions 
of theorem 0 It follows that S forms a maximal set of neighbor separators of G, 
bordering the set if. □ 



4 Triangulating Blocks 

In this section we prove that the potential maximal cliques of a graph are suffi- 
cient to compute its treewidth and its minimum fill-in. 

Let B = {S, C) be a block of the graph G. The graph R{S, C) = Gs [5 U G] is 
called the realization of the block B. The following proposition, proved in H3, 
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gives a relation between the treewidth and the minimum fill-in of a graph and 
some minimal triangulations of its realizations. 

Proposition 3. Let G be a non-complete graph. Then 

tw(G) = min max tw(R(S,G)) 
s&AcCeciS) 

mfi{G)= min {fill{S) + mfi{R{S,G))) 

CgC(S) 

where fill{S) is the number of non- edges of S. 

We want to give a characterization of the minimal triangulations of a realization 
C) using the potential maximal cliques 17 with S' C 17 and 17 C (S, C) and 
the minimal triangulations of the realizations of some blocks {Si^Gf), strictly 
included in (S,G). We will compute the treewidth and the minimum fill-in by 
dynamic programming on blocks. 

The minimal triangulations of the realizations of non-full blocks are easy 
reducible to the case of full blocks. For the following proposition, see P! : 

Proposition 4. Let (S,G) be a non-full block of G and let S* be the vertices of 
S adjacent in G to at least one vertex of G . Then 

tw{R{S,G)) = max(|S| - l,tw{R{S* ,C))) 

mfi{R{S, G)) = fill{S) + mfi{R{S*,C)) 

It remains to express the treewidth and the minimum fill-in of realizations 
of full blocks from realizations of smaller blocks. For this let us give first a 
characterization of minimal triangulations of a graph using a potential maximal 
clique and the realizations of some blocks. The proof has been omitted due to 
space restriction. 

Theorem 5. Let H be a minimal triangulation of G and let Q be a maximal 
clique of H . For each connected component Ci,l < i < p of G — fl, let Si be 
the vertices of fl having a neighbor in Ci. Then Hi = H[Si U Ci] are minimal 
triangulations of the realizations R{Si,Ci). 

Conversely, let fl be a potential maximal clique of G. For each connected 
component Gi, 1 < i < p ofG—fl, let Hi be a minimal triangulation of R(Si, Gi). 
Then H = (V, E{H)) with E{H) = E{Hi) U {{x, y}\x, y G 17} is a minimal 
triangulation of G . 

One can also prove that for any minimal triangulation H{S,G) of the real- 
ization of any full block R{S,G), there is a maximal clique fl of H{S,C) such 
that S C fl C (S,G) and 17 is a potential maximal clique of G. We deduce : 

Proposition 5. Let (S,G) be a full block ofG. Then 

tw{R{S,C)) = min max(|l7| — 1, Ci))) 

S (Z. ( 5 , C) 

mfi{R{S,G))= min fill{S) + '^mfi{R{S„G,)) 

S (Z ^ s ,c^ 
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Propositions 00 and El give us a dynamic programming algorithm which, us- 
ing the list of all potential maximal cliques of a graph G, computes the treewidth 
and the minimum fill-in of G. The algorithm is clearly polynomial in the number 
of vertices and the number of potential maximal cliques of G. 



5 Weakly Triangulated Graphs 

We consider now two non-adjacent vertices x, y of G. Let G' be the graph ob- 
tained from G by adding the edge {x,y}. We will show in this section that the 
potential maximal cliques of G can be computed from the minimal separators 
of G and the potential maximal cliques of G'. We will use this technique to 
compute all the potential maximal cliques of any weakly triangulated graph. 

Let once again f? be a potential maximal clique of G. Let Gi, . . . , Gp be the 
connected components of G — 17 and let Si be the set of vertices of 17 having 
at least a neighbor in Ci. We want to describe the behaviour of 17 and S in the 
graph G' . We denote by G(, . . . , G' the connected components of G' — 17 and by 
S'l, ... ,S'q the neighborhoods of G' in G' . From theorem 0 it follows that 17 is 
a clique in G'^g, g,y. If 17 has no full component in G', then 17 is a potential 

maximal clique of G'. We deduce : 

Theorem 6. Let f2 be a potential maximal clique of G. Let x, y be two non- 
adjacent vertices of G and let G' = G U {x,y}. Two cases are possible : 

1. fl can be written as S\ U {x}, S\ U {y} or Si U S 2 , where S\, S 2 are minimal 

x,y -separators ofG. 

2. Q is a potential maximal clique of G' . 

The weakly triangulated graphs were introduced in 0. A graph G is called 
weakly triangulated if neither G nor its complement G have an induced cycle 
with strictly more than four vertices. This class contains the chordal graphs, the 
chordal bipartite graphs and the distance hereditary graphs. 

We denote by N (x) the neighbors of the vertex x. We say that two vertices 
x,y of a graph G form a two-pair if their common neighbors N(x) H N(y) form 
an X, y-separator. It was proved in 0 that every weakly triangulated graph that 
is not a clique has a two-pair. Spinrad and Sritharan gave in HSl an algorithm 
recognizing the weakly triangulated graphs, based on the following theorem: 

Theorem 7. Let G = {V, E) be a graph and let {x,y} be a two-pair of G. Let 
G' = (V,E') be the graph obtained from G by adding the edge {x,y}. Then G is 
weakly triangulated if and only if G' is weakly triangulated. 

Notice that a clique is a weakly triangulated graph. The recognition algorithm 
considers an input graph G and, while G has a two-pair {x, y}, it adds the edge 
between x and y to G. At the end of the loop, either G became a clique, in 
which case the initial graph was weakly triangulated by theorem 0 or G is not a 
clique and it has no two-pair, in which case the input graph could not be weakly 
triangulated. 
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We denote by e the number of edges of G. Let now G = {V, E) be a weakly 
triangulated graph and let /i = {xi, j/i}, . . . , /e = {xg, j/e} be the edges added 
to G by the recognition algorithm in this order. We denote by Gi the graph 
(V, E U {/i, / 2 , • • ■ , fi}), with 0 < i < e (so Go = G and Gg is a clique). We will 
describe the minimal separators, respectively the potential maximal cliques of 
Gi using the minimal separators, respectively the potential maximal cliques of 
Gi+i, for any i <e. 

It is known that a weakly triangulated graph has at most e minimal sep- 
arators (Kloks, 0)- Indeed, it is easy to check that if x,y is a, two-pair of G 
and G' = {V,EU{xy}), then any minimal separator S' of G, different from 
Sxy = Nc{x) n Nc{y), is also a minimal separator of G'. So for any i < p, Gi 
has at most one more minimal separator than G^+i. We deduce: 

Proposition 6. A weakly triangulated graph G has at most e minimal separa- 
tors, where e is the number of edges of G. 

Notice that if x and y form a two-pair of G, then S^y is the unique x, y- 
minimal separator of G. In particular, if 17 is a potential maximal clique of G, 
we can not have two a:, y-minimal separators Si and S 2 with 17 = Si U S 2 . So 
we can refine the results of theorem 0 : 

Proposition 7. Let he a potential maximal clique of G. Let x,y be a two-pair 
of G and let G' = GU {xy}. Let Sxy = Ng{x) DNciy)- Two cases are possible : 

1. Q can he written as Sxy U {a;} or Sxy U {y}. 

2. Q is a potential maximal clique of G' . 

Proposition 8. A weakly triangulated graph G has at most 2e -I- 1 potential 
maximal cliques. 

Proof. We consider the sequence of graphs Go = G,G\, . . . ,Gg previously de- 
fined. Since G^+i is obtained from Gi by adding an edge between a two-pair, 
by proposition 0 Gi has at most two more potential maximal cliques than Gi+\. 
The graph Gg, which is a clique, has a unique potential maximal clique. □ 

Theorem 8. The treewidth and the minimum fill-in of weakly triangulated 
graphs can be computed in polynomial time. 

Clearly, all potential maximal cliques of a weakly triangulated graph can be 
listed in polynomial time, and therefore the treewidth and the minimum fill-in 
are computable in polynomial time. 

6 Conclusions 

We still do not know if the treewidth and the minimum fill-in are polynomially 
tractable for all graphs with a polynomial number of minimal separators. A way 
to prove this conjecture would be to answer the following question: does there 
exists a polynomial P such that for any graph G, one can compute a sequence 
of graphs Gq = G,G\, . . . ,Gp where Gp is a clique, G^+i is obtained from Gi by 
adding an edge and, for alH, 1 < z < p, l/icj < P{\Ac\, |P(G)|)? 
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Abstract. We show that the marked version of the Post Correspondence 
Problem, where the words on a list are required to differ in the first 
letter, is decidable. On the other hand, PCP remains undecidable if we 
only require the words to differ in the first two letters. Thus we locate 
the decidability/undecidability-boundary between marked and 2-marked 
PCP. 

1 Introduction: PCP and Marked PCP 

The Post Correspondence Problem (PCP) |S| is one of the most useful unde- 
cidable problems, because it can be simply described and many other problems 
can easily be reduced to it, particularly problems in formal language theory. The 
general form of the problem is as follows. An instance of PCP is a four-tuple 
I = {S, A, g, h), consisting of a finite source alphabet S = {oi, . . . , a„}, a finite 
target alphabet A and two homomorphisms g,h : S* — > A* (g{ab) = g(a)g{b) 
and h{ab) = h{a)h{b) whenever a,b G S*). It is enough to define g,h : S ^ A*, 
the extension is just concatenation. PCP is the following decision problem: 

Given / = {S, A, g, h), is there an x € such that g{x) = h{x)l 

In other words, we have two lists of words 5 ( 01 ), . . . , 5 (a„) and /i(oi), . . . , /i(a„) 
and we want to decide if there is a correspondence between them: are there 
ail,- - - G S such that g(aq) . . . g{ai^^) = h(aij . . . h{a^^)7 

The general form of this problem is undecidable 0, the reason being that the 
two morphisms together can simulate the computation of a Turing machine on 
a specific input. Examining restricted versions of PCP allows one to determine 
the exact boundary between decidability and undecidability. For instance, the 
problem becomes trivially decidable (but NP-complete) if we ask for the exis- 
tence of a solution x of length at most some fixed fc |21 p. 228]. If we restrict 
to g,h which have to be injective {g is injective if a: yf y =k g{x) yf g{y)), the 
problem remains undecidable 0. Also PCP (7), where we restrict to n = 7, is 

* Supported by the Academy of Finland under grant 14047. 
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still undecidable u, but PCP(2) is decidable p. As far as we know, decidability 
or undecidability is still open for 2 < n < 7. 

A further restriction which we will examine in this paper is to have g and h 
marked, which we formally define as follows. If z is a string, we use Prefk{z) to 
denote the prefix of length k oi z {Prefk{z) = z if |z| < k). A homomorphism g 
is k-marked if g{a) and g{b) are nonempty and have Prefk{g{a)) ^ Prefk{g(b)) 
whenever a ^ b G S. An instance / = (A, A,g, h) of PCP is k-marked if both g 
and h are ^-marked, and fc-marked PCP is the PCP decision problem restricted 
to fc-marked instances. We will abbreviate 1-marked to marked. If I is marked 
then g{a) and g{b) start with a different letter whenever a ^ b G S, which implies 
that |A| < |Z\|. Without loss of generality we may assume S C A. Markedness 
clearly implies injectivity: suppose g is marked and x ^ y G A"*", let x = zax' and 
y = zby' , a and b being the first letter where x and y differ. Because of markedness 
we have g{a) ^ g{b), hence g{x) = g{z)g{a)g{x') ^ g{z)g{b)g{y') = g{y), so g 
is injective. The converse does not hold. Consider for instance A = Z\ = {1,2}, 
g(l) = 11, g(2) = 12, then g is injective but not marked. 

The proof of decidability of PCP (2) in P is based on a reduction from 
arbitrary instances of PCP (2) to marked instances of generalized PCP (2). P 
then prove by means of extensive case analysis that marked generalized PCP(2) is 
decidable. In particular marked PCP (2) is decidable. Here we prove that marked 
PCP is decidable for any alphabet size. We will in fact show that marked PCP 
is in EXPTIME (the class of languages that can be recognized in time upper 
bounded by for some polynomial p of the input size N). 

As stated above, PCP can be used for establishing the boundaries between 
decidability and undecidability. The main result of this paper is decidability of 
marked PCP. How much can we weaken the markedness condition before we 
lose decidability? We will show in Section P that 2-marked PCP is undecidable, 
thus locating the decidability /undecidability-boundary between 1-mar kedness 
and 2-mar kedness. 

In another direction, we can weaken the markedness condition by only re- 
quiring g and h to be prefix morphisms (g is prefix if no g{ai) is a prefix or 
another g{aj)) or even biprefix (g is biprefix if no g{ai) is a prefix or suffix of 
another g{aj)). It turns out that biprefix PCP is undecidable 00 

2 Marked PCP Is Decidable 

2.1 A Simpler Decision Problem 

We would like to give a decision method for marked PCP. First we give an 
algorithm for the following simpler problem, which also occurs in ^ Section 6]: 

Given marked / = (A, A, g, h) and a G A, are there x,y G A'*' such that 
g{x) = h{y) and g{x) starts with a? 



^ Clearly, a marked morphism is prefix. Both marked and biprefix PCP are special 
cases of injective PCP, but 2-marked PCP is not. See also at the end of Section^ 
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We do not look for g{x) = h{x) here but only for g{x) = h{y), and we additionally 
require that g{x) starts with some specific a € A. For example, if I has 

g{ai)=ai 5(02) = 02 3(03) = 0304 5(04) = 04 

/i(oi) = 0403 h(o2) = 0402 h{a3) = asas ^,(04) = 0202 

then for a = ai, a solution would be a; = 040302 and y = 04O2. 

The next algorithm decides the problem. 

1. Set G = H = ^,i = j = l. 

2. If there are x\,yi G S such that 5(0:4) and h{y\) start with o, then set 

X = xi, y = yi 

else goto 4. 

3. (a) If g{x) = h{y), then print “solution x = xi . . . Xi and y = y\ . . . y/’ and 

terminate. 

(b) If g{x) is not a prefix of h{y) nor vice versa, then goto 4. 

(c) If g{x)s = h{y), then do the following. 

If s G G then goto 4; else set z = z + 1 and G = G U {s}. 

If there is an Xi such that g{xi) and s start with the same letter, then 

set x = XX i and goto 3; else goto 4. 

(d) If g{x) = h{y)s, then analogous to previous step. 

4. Print “no solution” and terminate. 

Informally, we are building x = x\ ... Xi and y = yi . . .yj, trying to achieve 
g{x) = h{y). We add on a new Xi+\ as long as g{x) is a proper prefix of h{y) 
(i.e., g{x)s = h{y) for some suffix s), and add on a new yj+i if h{y) is a proper 
prefix of g{x). Note that at each point such Xi+\ or yj+i are unique (if they 
exist) because of markedness; if they do not exist we know there is no solution. 
We keep track of the suffixes we have seen so far in the sets G and H. Because 
the number of possible suffixes is finite, either the process terminates with a 
solution, or at some point a suffix is encountered for the second time, in which 
case we know the process will cycle forever and there is no solution. 

The solutions produced by this algorithm are of minimal length. Note care- 
fully that the whole procedure is deterministic, because 5 and h are marked. 
Furthermore, if N is the length of the instance I given as input (i.e., the num- 
ber of bits needed to describe the instance), then this procedure runs in time 
polynomial in TV. Namely, each g{ai) and h(ai) can have length at most TV, and 
hence can have at most TV — 1 proper suffixes. Since there are only 2n = 0(TV) 
different 5(0^) and h{ai), there are only 0 {N'^) different suffixes, hence the loop 
of the algorithm can be repeated at most O(TV^) times. This loop itself takes 
0(TV) steps, because (1) to check if g{x) = h{y) or g{x)s = h{y) or g{x) = h{y)s, 
we only need to check the way g{x) and h{y) have been changed by the addition 
of the previous Xi or yj, and (2) searching for a new Xi (in step c) or yj (in step 
d) can be done in 0 (n) = 0(TV) steps. Therefore the whole procedure runs in 
O(TV^) steps. 
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2.2 Reducing to Simpler Instances 

Consider an instance / = {S,A,g,h) of marked PCP: we have two marked 
homomorphisms g, h : , where E = {oi, . . . , a„} C A, and we want 

to decide if there is an a; € E^ such that g{x) = h{x). Below we describe an 
approach to decide / by reducing it to an equivalent but simpler instance I' of 
marked PCP (“equivalent” meaning that I has a solution iff I' has one). 

Suppose A = {ai, . . . , ai}, I > n. We can run the procedure of the pre- 
vious section for every ai € A, yielding pairs of (minimal- length) solutions 
(rti,ui), . . . , (ui,vi) where Ui,Vi G E~^ and g{ui) = h{vi) starts with ai, or non- 
existence of solutions for certain i. At most n of the ai can have a solution. 
Without loss of generality assume < n are the i that have a solu- 

tion. We can turn this into a new instance I' = {E' , A, g' , h') of PCP, where 
E' = {ai, . . . , Um}, g'{o,i) = Ui and h' {ai) = Vi. Note that g' and h' are marked, 
so /' is an instance of marked PCP. Also, since the procedure of the previous 
section runs in 0{N^) steps and has to be run n times here, I' can be built from 
I in 0{N'^) steps. The reduction from / to I' preserves equivalence: 

Lemma 1. If I and I' are as above, then I and I' are equivalent. 

Proof. Note that every solution x to I must be built up from Ui and Vi'. there 
must be ii,...,ik such that x = Ui,^ ■■■Ui^. = ■■■Vi^,. This is easy to see 

from the example in Figure ^ Here u\ = 050301 and vi = 0503 is a solution 
to the simpler problem for oi, similarly (0204,0102) is a solution for 05 and 
(0603,040503) is a solution for 02. Here x = 05O3O1O2O4O6O3 is a solution to J, 
x' = 010002 is a solution to related by x = g'{x'). 



g{x) = 

h{x) = 

h' {ai)=vi=aza^ h' {ao) = VQ = axa2 h,'(a2)=t^2=a4«6<^3 



g'(ai)=iti=a5<i3ai g {aQ)=UQ = a2a4^ g' {a2) = U2=aQa^ 



(-1) g{as) 


5 (as) 


g{ai) 


(“ 6 ) g{a2) 


g{ai) 


(“2) giae) 


5 (as) 


(ai) /i(as) 


h{a3) 


(ag) h{ai) 

'' 


h{a2) 


(aa) h{ai) 


h{aA) 


Has) 



Fig. 1. How a solution to / translates to I' and vice versa 



In general, by construction, if x' is a solution to I' then x = g'(x') = h'{x') 
is a solution to I. And conversely, for every solution x to I there is a solution x' 
to /' such that x = g' {x') = h'{x'). Thus I and /' are equivalent. □ 

If we could prove that I' is somehow simpler than I , then we could repeat the 
procedure, reduce to simpler and simpler equivalent instances I" , I'" ,. . . , and 
eventually decide I. There are at least two ways in which I' can be simpler than 
/: \E'\ < I A| (m < n) or cr(/') < a{I), where a measures the “suffix complexity” 
of an instance I = {E, A, g,h) 

cr(J) = I UaGi; {x I X is a proper suffix of 5(a)}| 

+ I UaGi; {x I X is a proper suffix of /i(a)}| 
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If n = m, we would like I' to be simpler than / in the sense that a {I' ) < 

The following lemma shows that I' at least cannot be more complex than J: 

Lemma 2. If I and I' are as above, then <j{I') < cr(/). 

Proof. Define the following four sets: 

G = Uagi;{a; I a; is a proper suffix of 5(a)} 

G' = UaGi;'{a; | a; is a proper suffix of g'{a)} 

H = UaGi;{a^ I a; is a proper suffix of h{a)} 

H' = I a; is a proper suffix of h' {a)} 

We will define an injective function p : G' —> H . Let u € G' , so u is a, proper 
suffix of some specific g' (ai) = Ui = x\ . . . Xc generated by the procedure of the 
previous section. Let Xr be the first letter of u, and s be the shortest suffix of 
some h(jjt) due to which Xr was added to Ui in the procedure of the previous 
section, so s is a prefix of g{xr) (see FigureEJ or vice versa. Define p as p{u) = s. 



u=Xr-Xj..^l . . .Xc 



g{ui) = 
h{vi) = 


g{xi) 




g{Xr-l) 


g{Xr) 


g{Xr+l) 




g{xc) 


Hyi) 




h{yt) 


h(yt+i) 




h{Vd) 



s 



Fig. 2. The suffix s corresponding to u 



We will show p is injective. If u, u' € G' and p{u) = p{u'), then u and u' are 
associated with the same suffix s = p{u), hence u and u' must start with the 
same Xr and (by determinism of the procedure of the previous section) continue 
in the same way, giving u = u' . Thus p is injective, which implies \G'\ < \H\. 

Similarly we can define an injective function from H' to G, which proves 
\H'\ < |G|. It now follows that a{I') = |G'| + \H'\ < |G| + \H\ = cr(/). □ 

2.3 The Algorithm 

We will here give a method to decide if a given instance / = {S , A, g,h) of 
marked PCP has a solution. The idea is to make a sequence of equivalence- 
preserving reductions /q = I, Ii, I2, ■ ■ ■, such that once in a while a reduction 
from Ii to li+i simplifies the instance (makes the source alphabet or the suffix 
complexity smaller) . We will show that either this sequence of reductions reaches 
an Ij which has source alphabet of size 1 or tj equal to 0 (so Ij is decidable), 
or the sequence will repeat itself after a while and start cycling. Such cycles 
are detectable, and we will show that every / leading to such a cycle is easily 
decidable. 

So suppose the sequence of reductions does not reach an Ij with alphabet of 
size 1 or (j{Ij) — 0. Then it must get “stuck” at a certain source alphabet size 
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and a. That is, there exist a, k, m and z such that all li in the infinite sequence 
Ik, Ik+i, Ik+ 2 , ■ ■ ■ have source alphabet of size m and have = z. Now this 
sequence must repeat itself after a while, for otherwise there would be infinitely 
many distinct instances with the same alphabet and cr-value, contradicting the 
next lemma. 

Lemma 3. Let E = {oi, . . . , am} ^ A he finite sets and z be a positive natural 
number. There exist only finitely many distinet instanees I = {E, A,g, h) of POP 
that satisfy a{I) < z. 

Proof. An instance I = {E, A, g, h) is completely specified by giving the 2m 
words g{ai), . . . ,g{am), /i(ai), • ■ • , h{am) G A~^ . Note that if one of those words 
has length > z + 1, then this word has more than z proper suffixes and cr(/) > z. 
Accordingly, each of the 2m words can have length at most z+1. There are 
J2i=i — 1^1^"'’^ such words. Thus there are at most choices for 

2m such words, and hence finitely many different / that satisfy a{I) < z. □ 

This lemma shows that if the procedure does not converge to very simple 
instances then it will cycle, and we can detect this by noting that some Ik and 
Ir {k < r) are equal. It remains to show how we can decide such “cycling” 
instances of marked PCP. So suppose we have a cycle, assume without loss of 
generality that it already starts at /g: 

/q ^ /i — > • • • ^ Ir-l Ir = Iq, 

where li = (E, A, gi, hi). By the proof of Lemma Q for every solution Xi to 
some li, there is a solution Xi+i to A+i such that Xi = gi+i{xi+i) = hi^i(xi+i) . 
Suppose xq is a solution to Iq of minimal length. There must exist some solution 
Xr to Ir such that 

Xo = 9x92 ■ ■■9r{Xr) 

Xo = hih2 . . . hr{Xr) 

Since the gi and hi cannot be length-decreasing, we have \xq\ > \xr\. But Xq was 
chosen to be a minimal-length solution to /g and Xr is also a solution to A = /g, 
hence \xq\ = \xr\. This implies that go{= gr) and hg(= hr) map the letters 
occurring in Xr to letters. But then the first letter of Xr is already a solution, 
hence \xq\ = \xr\ = 1. Thus /g has a solution iff /g has a 1-letter solution (i.e., 
there is an a S Ag such that 50(a) = hg(a)), and this is trivially decidable. 
Below we summarize this analysis in an algorithm and a theorem: 

Decision procedure for marked PCP 

1. Set J= 0, f = 0, Jg = I. 

2. Set i = i + 1. 

3. Reduce A-i to A in the way stated above. 

4. If A has source alphabet of size 1 or cr = 0, then decide A, print the outcome 
and terminate. 
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5. If li is simpler than (smaller source alphabet or a) then set 1 = 0 and 
goto 2. 

6. If Jj G 2" then there is a cycle and we can decide li by checking if it has a 
1-letter solution, print the outcome and terminate; 

else set I = X U {li} and goto 2. 

Theorem 1. Marked PCP is decidable. 

2.4 Complexity Analysis 

Let us analyze the complexity of this algorithm. Let N be the length of the input 
instance /. Each reduction from C to X+i can be done in O(fV^) steps. How 
many different reductions do we need to make? For a fixed alphabet size |L?| < 
|Z\| = m and suffix complexity z, we can make at most j 7 j(^+ 2 ) 2 m reductions 
before detecting a cycle (proof of LemmaEJ. Since m = 0{N) and z = O(fV^), 
this gives an upper bound of '> on the number of reductions for fixed 

alphabet size and suffix complexity. Alphabet size and suffix complexity cannot 
increase during the process. There are at most n = 0{N) different alphabet sizes 
and at most a{I) — 0{N'^) different suffix complexities possible, so we have to 
make no more than 0{N^) ■ 1 reductions. Since the set I can contain 

at most ) instances, the test G X in step 6 can be performed in 

20 {iogN-N^) g|;epg^ Thus the whole algorithm works in steps, which 

means that marked PCP is in EXPTIME. 

3 2-Mar ked PCP Is Undecidable 

Here we will show that if we weaken the condition of markedness, by only re- 
quiring the morphisms to be 2-marked, then PCP becomes undecidable again. 

Consider the following semi-group S 7 with set of 5 generators P = {a, b, c, d, e} 
and 7 relations: 

87 = {a, 5, c, d, e I i? ) 

R = {ac = ca, ad = da, be = cb, bd = db, eca = ce, edb = de, cca = ccae} 

Tzeitin PI]| (see also P, p. 445]) proved that the following problem for this 
semi-group is undecidable: 

Given u,v G X“*", is u = v G 87 ? 

Note that the set of 7 left-hand-sides of R is 2-marked, and similarly for the 
set of 7 right-hand-sides of R. We will reduce this problem to 2-marked PCP. 
We use a slight modification of the standard reduction from word problems to 
PCP, involving an alphabet with some underlined letters in order to ensure 2- 
markedness. 

Define the source alphabet as 

2 : = X U X U {H, E, #, #, ri , T 2 , . . . , T7, n, T 2 , . . . , rr}. 
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where P = {a,b,c,d,e}, and ri, . . . ,r 7 are the 7 relations in R and ri , . . . , rr 
are their underlined versions (considered as single letters), so ri = [ac = ca], 
JX = [oc = ca\ etc. Define the target alphabet as 

A = ruru{B,E,#,#}. 

B and E will mark the beginning and end of expressions, respectively. Given 
u,v G r~^, g and h are defined by Table ^ 





B 


E 


# 


# 


a 




e 


a 




e 


[s = t] 


k = t] 


9 


Bu# 


E 


# 


# 


a 




e 


a 




e 


t 


S 


h 


B 


±vE 


# 


# 


a 




e 


a 




e 


s 


t 



Table 1. Definition of g and h 



Note that the constructed instance I = {E, A, g, h) is an instance of 2-marked 
PCP. The following lemma shows that the reduction preserves equivalence with 
Tzeitin’s problem: 

Lemma 4. Let u, v, I be as above. Then u = v G S 7 iff I has a solution. 

Proof. 

=^: Suppose u = v G S 7 . Then there is a sequence u = ui —>■ U 2 ^ 

Uk = V, where ut = u' su" and Ui+\ = u'tu”, and s = tGRoit = sGR. We 
construct a solution to / by induction on k. 

If k = 1, then u = v G E'^. Now x = BuffuE is a solution to I. 

Now let /' = {E, A, g' , h') be the instance of 2-marked PCP corresponding 
to u = Uk-i G S 7 . By the induction hypothesis we can assume that I' has 
a minimal-length solution x' . It is easy to see that every solution must begin 
with B and end with E, so x' = ByE, and therefore g'{By) = wffuk-i and 
h'(By) = w for some w. Note that since I and I' only differ in the assignment 
h{E) and h'(E)., and E cannot occur in y (because x' is minimal), we also have 
g{By) = wffuk-i and h{By) = w. We distinguish two cases. Firstly, Uk-i = 
u' su" and v = Uk = u'tu" , where r = [s = f] is one of the 7 relations. Then 
it is easily verified that x = By#u' ru" # u' tu" E is a solution to /. Secondly, if 
Uk-i = u'tu" and v = u^ = u'su", then x = Byffu' tu" 4f u' ru" E is a solution. 
This completes the induction step. 

4=: Suppose I has a solution x. We can assume x is of minimal length. 
This X must be of the form Bx\X 2 . . ■ XmE, where Xi G E, so g{Bx\ . . . XmE) = 
Buff g{xi ... Xm)E = h{Bxi . . .XmE) = Bh{xi . . . Xm)ffvE. Ignoring the un- 
derlining, g(x) = h{x) must be of the form Buiffu 2 ff . . . ffuk-iffukE , where 
Ui G r*, ui = u and Uk = v. We will show that Ui = Ui+\ G S 7 for every 
1 < i < k — 1, from which u = v G S 7 follows. 

Because ff occurs in h{xi . . . Xm), there must be some least i such that Xi = 
ff, and hence u = h{x\ . . . Xi-i). Since there is no underlining in u, it follows that 
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xi, . . . , Xi-i must have been chosen from a, . . . , e, ri, . . . , ry. Let xi . . . Xi-i = 
...wi, with Wi G F* and = [si = U] G {ri,...,r7}. Then u = 
h{wiri^W2ri2 ■ ■ -Wi) = wiSij^W2Si2 ...wi. See Figure 0 for illustration. 



g{Bxi ...Xi. .. XmE) 
h{Bxi ...Xi... XmE) 



g(B) = BuH^ 



9(E) 




Fig. 3. Picture leading to u = v 



Note that g{x\ . . .Xi-i) = g{wiri^W2ri^ . . .wi) = tciUj 1^2^12 ■ • - But now, 
since we must have g{x\ . . . XmE) = h{xi+\ . . . x„iE), there must be a least j > i 
such that Xj G {#, #} and h(xi+i . . . Xj-i) = g{x\ . . . Xi-i) = witij^W2ti2 ■ ■ ■ wi. 
The latter string (without underlining) is U2. Note that u\ = U2 G S 7 , because 
rti(= u) and U2 only differ by U2 having U where ui has Si. 

Continuing this reasoning, we can show that for every two words Ui, ui+i G F* 
occurring in g(x) = h(x) separated by ignoring underlining, we must have 
Ui = Ui+i G Sr (some of the words Ui and Ui+\ may actually already be equal in 
N''*'). Hence u = v G Sr, since g{x) starts with u\ = u and ends with Uk = v. □ 

Together with Tzeitin’s result, the above lemma implies: 

Theorem 2. 2-Marked PCP is undecidable. 

To end this section, we note that 2-marked PCP is not a special case of 
injective PCP. For example, the morphism defined by g(l) = 23, g{2) = 2, g{3) = 
3 is 2-marked but not injective. We can combine /c-markedness and injectivity by 
calling a morphism g strongly k-marked if g is both A:-marked and prefix (i.e., no 
gifli) is a prefix of another g(aj)). This clearly implies injectivity. It follows from 
a construction of Ruohonen |S| that strongly 5-marked PCP is undecidable: the 
biprefix instances of PCP constructed there to show undecidability of biprefix 
PCP are also 5-marked. Decidability of strongly fc-marked PCP for 1 < A: < 5 is 
still open. 



4 Conclusion and Future Work 

We can investigate the boundary between decidability and undecidability by ex- 
amining which restrictions on the Post Correspondence Problem render the prob- 
lem decidable. We have shown here that restricting PCP to marked morphisms 
gives us decidability. On the other hand, 2-marked PCP is still undecidable. 
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The following questions are left open by this research: 

— Is exponential time the best we can do when deciding marked PCP, or is 
there a polynomial-time algorithm for the problem? 

— What about decidability of strongly fc-marked PCP for 1 < A: < 5? 

— What about decidability of marked generalized PCP HE]? 

— The decidability status of PCP with elementary morphisms P, pp. 72- 
77] is still open. A morphism g is elementary if it cannot be written as 
a composition g 2 gi via a smaller alphabet. Marked PCP is a subcase of 
elementary PCP which we have shown here to be decidable. Can our results 
help to settle the decidability status of elementary PCP? 
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Abstract. We investigate the satisfiability problem of word equations 
where each variable occurs at most twice (quadratic systems). We obtain 
various new results; The satisfiability problem is NP-hard (even for a 
single equation). The main result says that once we have fixed the lengths 
of a possible solution, then we can decide in linear time whether there 
is a corresponding solution. If the lengths of a minimal solution were at 
most exponential, then the satisfiability problem of quadratic systems 
would be NP-complete. (The inclusion in NP follows also from ED) 

In the second part we address the problem with regular constraints: The 
uniform version is PSPACE-complete. Fixing the lengths of a possible 
solution doesn’t make the problem much easier. The non-uniform ver- 
sion remains NP-hard (in contrast to the linear time result above). The 
uniform version remains PSPACE-complete. 



1 Introduction 

A major result in combinatorics on words states that the existential theory of 
equations over free monoids is decidable. This result was obtained by Makanin 
0 . who showed that the satisfiability of word equations with constants is de- 
cidable. For the background we refer to El, to the corresponding chapter in the 
Handbook of Formal Languages, j^, or to the forthcoming 0. There are also 
two volumes in the Springer lecture notes series dedicated to word equations and 
related topics: and P). Makanin’s Algorithm is the construction of a finite 

search graph. It’s finiteness proof is probably among the most complex proofs in 
theoretical computer science. The algorithm was implemented in 1987 at Rouen 
by Abdulrab, see p. 

In 1990 Schulz showed an important generalization: Makanin’s result remains 
true when adding regular constraints, m- Thus, we may specify for each word 
variable x a regular language and we are only looking for solutions where 
the value of each variable x is in L^- Having this form it was also possible to 
extend Makanin’s result to free partially commutative monoids (known also as 
trace monoids), see P ITT] 

* This work was partially supported by the French German project PROCOPE, the 
anthors acknowledge this support and the valuable comments of the anonymous 
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C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 217-^23 1999. 
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The inherent complexity of the satisfiability problem of word equations is how- 
ever not well-understood. The known lower bound follows already from the unary 
case. A system of word equations over a unary alphabet of constants is equiva- 
lent (under logspace reductions) to an instance of linear integer programming - 
and vice versa. It is a well-known classical fact that linear integer programming 
is NP-complete, see e.g. |^. Thus, the satisfiability problem for a single equation 
over a binary alphabet of constants is NP-hard. For the upper bound, a first anal- 
ysis in the works of Jaffar and Schulz showed a 4-NEXPTIME result EH3H3|. 
By Koscielski and Pacholski Cor. 4.6] this went down to 3-NEXPTIME. 
The present state of the art is due to Q . Gutierrez showed that the problem is 
in EXPSPACE, in particular, it is in 2-DEXPTIME. Another recent and very 
interesting result is due to Rytter and Plandowski m- It shows that the min- 
imal solution of a word equation is highly compressible in terms of Lempel-Ziv 
encodings. It is conjectured that the length of a minimal solution is at most ex- 
ponential in the denotational length of the equation. If this were true, then the 
Lempel-Ziv encoding has polynomial length and, following IZP, the satisfiability 
problem for word equation with constants would become NP-complete. At the 
moment we believe we are far away from proving this audacious conjecture. 

The objects of interest here are quadratic systems, i.e., systems of word equations 
where each variable occurs at most twice. In combinatorial group theory these 
systems have been introduced by H21, see also [n|. They play an important role 
in the classification of closed surfaces and basic ideas of how to handle quadratic 
equations go back to m- The explicit statement of an algorithm for the solution 
of quadratic systems of word equations appears in m- 

We obtain various new results concerning quadratic systems. We show that the 
satisfiability problem is NP-hard (even for a single equation). The main result 
of the paper states that once we have fixed the lengths of a possible solution, 
then we can decide in linear time whether there is a corresponding solution. As a 
corollary we can say that if the lengths of a minimal solution of solvable quadratic 
systems were at most exponential, then the satisfiability problem would be NP- 
complete. The conclusion of containment in NP follows also from but the 
method here is more direct and yields a much simpler approach to the special 
situation of quadratic systems. 

In the second part we address the problem with regular constraints. The uni- 
form version is PSPACE-complete. We also show that fixing the lengths of a 
possible solution doesn’t make the problem much easier. The non-uniform ver- 
sion remains NP-hard (in contrast to the linear time result above). The uniform 
version remains PSPACE-complete. 

Due to lack of space this extended abstract does not contain all proofs. They 
will appear elsewhere. 

2 Notations and Preliminaries 

Let A be an alphabet of constants and let 17 be a set of variables. As usual, 
{A U 17)* means the free monoid over the set A U 17; the empty word is denoted 
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by e. A word equation L = R is a, pair (L,R) e (A U 17)* x (A U 17)*, and a 
system of word equations is a set of equations {Li = Ri, . . . , Lk = Rk}- 
A solution is a homomorphism a: (AU17)* ^ A* leaving the letters of A invariant 
such that cr{Li) = a{Ri) for all 1 < i < k. A solution cr: 17 ^ A* is called 
minimal, if the sum minimal. 

A system of word equations is called quadratic, if each variable occurs at most 
twice. In the present paper we consider only quadratic systems. 

3 Quadratic Equations 

Quadratic systems are, in principle, easy to solve by using Nielsen transforma- 
tions. The standard algorithm is from uni; it uses non-deterministic linear space 
and works as follows: The first step is to guess which variables can be replaced 
by the empty word. Then we may assume that the first equation is of the form 

x--- = y- 

where x ^ y and y is a variable. Moreover, we may also assume that a; is a 
prefix of y. Then we replace all occurrences of y (at most two) by xy, and we 
cancel x on the left of the first equation. Having done this, we guess whether y 
can be replaced by the empty word. Then we repeat the process. The size of the 
quadratic system never increases, but the length of a minimal solution decreases 
in each round. Hence, the non-deterministic algorithm will find a solution, if 
there is any. A non-redundant execution of the algorithm will go through at 
most exponentially many different systems. Thus, there is an doubly exponen- 
tial upper bound on the the length of the minimal solution. This seems to be 
quite an overestimation. We have the following conjecture. 

Conjecture The length of a minimal solution of a solvable quadratic system 
of word equations is at most polynomial in the input size. 

The value of TheoremO would already increase, if only the following much weaker 
conjecture were true. 

Conjecture (weak form) The length of a minimal solution of a solvable 
quadratic system of word equations is at most exponential in the input size. 

The first result of the present paper shows that the satisfiability problem of word 
equations remains NP-hard, even in the restricted case of quadratic systems. In 
fact, based on the conjectures above, we strongly believe that it is NP-complete 
in this case, see Corollary Q 

Theorem 1. Let |A| > 2. The following problem is NP-hard. 

INSTANCE: A quadratic word equation. 

QUESTION: Is there a solution a: 12 — > A* ? 
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Proof. We give a reduction from 3-SAT. Let F = Cq A ■ ■ ■ A Cm-i be a proposi- 
tional formula in 3-CNF over a set of variables S. Each clause has the form 

Ci = {X^i V X^i+i V A3i_|_2) 

where the Xj are literals. We can assume that every variable has both positive 
and negative occurrences. 

First we construct a quadratic system of words equations using word variables 

Ci, di, 0 < i < m — 1, 

Xj, 0 < j < 3m — 1, 
y^, z^, for each X G S. 

We use the constants a, b, b. For each clause Ci we have two equations: 
CiX3iX3i+iX3i+2 = and Cid* = 

Now let X G S. Consider the set of positions {ii, . . . ,ik} where X = = 

• • • = Xif^ and the set of positions {ji, . . . , j„} where X = Xj^ = ■ ■ ■ = We 

deal with the case k < n; the case n < k is symmetric. With each X we define 
two more equations: 

Vx^x — ^ Xij • • • Xi^y^^a^bxj.^ ■ ■ ■ Xj^z,^ = C^ba^b. 

The formula is satisfiable if and only if the quadratic system has a solution. 
Next, a system of k word equations Li = Ri, . . . , Lk = Rk, k > 1 with R\ - ■ ■ Rk G 
{a, 5}* is equivalent to a single equation temporarily using a third constant c: 
Lic - ■ ■ Lk-icLk = R\C - ■ ■ Rk-icRk- Finally, we can eliminate the use of the 
third letter c without increasing the number of occurrences of any variable by 
the well-known technique of coding the three letters as aba, abba and abbba and 
replacing each occurrence of a variable x by axa, for a classical reference see uni 

The following theorem is the main result of the paper. In a slightly different form 
it appeared first in an unpublished manuscript of the first author m- 

Theorem 2. There is a linear time algorithm to solve the following problem (on 
a unit cost RAM). 

INSTANCE: A quadratic system of word equations with a list of natural numbers 
6a; G N, x G fi, written in binary. 

QUESTION: Is there a solution a: fl — > A* such that \cr{x)\ = bx for all x G C? 

Proof. In a linear time preprocessing we can split the system into equations each 
containing a maximum of three variable occurrences: to see this let Xi ■ ■ ■ Xg = 
Xg+i ■ ■ ■ Xd be a word equation of the system with l<g<d, XiGAuC for 
I < i < d. Then the equation is equivalent to: 

xi = yi, Xgj-i = yg+1, 

yiX2=y2, Vg+lXg+2 = yg+2, 

Vg-^Xg = Pg, 



Vg = Vd- 



yd—lXd — yd 
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Here yi, . . . ,yd denote new variables, each of them occurring exactly twice. After 
the obvious simplification of equations with only one variable or constant on each 
side, we obtain a system where each equation has the form 2 = xy, x,y, z G AU^2. 
In fact, using (for the first time) that the lengths bx are given, we may assume 
that bx 0 for all variables x G f2 and that each equation has the form 2 = xy, 
where 2 is a variable. If m denotes the number of equations, then we can define 
the input size of a problem instance E as: 

d{E) = TO+ ^ 10g2(6a;). 

xG O 



For X & AU fl lei \x\ = bx li X & Q, and |a;| = 1 if a; G A. We are looking for a 
solution cr such that |cr(a;)| = \x\ for all a: G A U 17. A variable z G 17 is called 
doubly defined, if E contains two equations z = xy and z = uv. Let dd{E) be the 
number of doubly defined variables. Define c = 0.55 and k = 3/ ln(l/c) (« 5.01). 
Finally, we define the weight of the instance E as follows 

W{E) = \n\ + dd{E) + fc ^ In |a;|. 

xG O 

We start the algorithm with the assumption that |z| = |x| + |y| for all equations 
z = xy and |a;| yf 0 for all variables x. Since W{E) G 0{d{E)), it is enough to 
show how to reduce the weight by at least 1 in constant time. If dd{E) = 0, then 
the system is solvable and we are done. Hence let dd{E) > 0. Consider a doubly 
defined variable z and its two equations: 



z = xy. 



z = uv. 



If I a: I = |rt|, then we either have an immediate contradiction or we can eliminate 
at least one variable, namely z. Therefore, without restriction, 0 < |m| < \x\. 

Let w be a new variable with |w| = \x\ — |u|. We replace the equations z = xy 
and z = uv by: 



x = uw, 

V = wy. 

If |w| < c|z| we have reduced l®l least ln(l/c) while increasing 

dd{E) by at most 1. So the weight has been reduced by at least 2 and we are 
done with this step. 

Hence in what follows we assume |w| > c|z|. 

li X = V, then u and y are conjugates. Hence for some a > 0 and new variables 
r, s we can write: 

u = rs,w = (rs)°'r, and y = sr 

Note that the values of a, |r|, and |s| can be calculated in constant time from 
|a;| and |m|. Since the system is quadratic, there are no other occurrences of x. 
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Hence we can replace the equations x = uw and v = wy by: 



u = rs, 
y = sr. 

The overall effect is that z and x have been replaced by r and s. The number of 
doubly defined variables may be greater by at most 1, but we have |r| < |u| < 
\z\ — |w| < (1 — c)\z\ and |s| < |a;|, so that 1^1 been reduced by 

at least ln(l/(l — c)). Hence again we have reduced W{E) by more than 1. Let 
us make a comment here: In a minimal solution we must have a = 0. But this 
contradicts the assumption |w| > c\z\ > (1 — c)\z\ > |m|. Hence, the case x = v 
is not possible for a minimal solution at this stage of the algorithm. 

We are in the case x ^ v and |w| > c\z\. If neither x nor v has a second definition, 
then 1^1 increased but the number of doubly defined variables 

has decreased by 1 thus decreasing W{E) by 1. Hence we may assume that there 
is an equation 



x = pq. 

If p is long, which means here \p\ > c\z\, then we return to the original situation: 

2 = xy, 
x = pq, 
z = uv. 

We introduce a new variable r with |r| = \z\ — \p\ and we replace the first two 
equations z = xy and x = pq hy: 

z = pr, 
r = qy. 

Since |r| < (1 — c)\z\ and |a;| > c\z\ we have reduced 1^1 least 

ln(c/(l — c)) while leaving dd{E) unchanged (since r is not doubly defined). So 
W (E) has decreased by more than k ln(c/(l — c)) > 1 and we are done. Therefore, 
the situation is as follows: 



X = uw, 
x = pq, 

V = wy. 

We have |u| < (1 — c)|z|, \p\ < c\z\, and |w| > c\z\. If |m| = \p\, then again, either 
there is an immediate contradiction or we can eliminate at least one variable. 
Assume first |u| < \p\. Then we introduce a new variable r with |r| = |p| — |u| 
and we replace the equations x = uw and x = pq by: 



p = ur, 
w = rq. 
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The other (and final) case is in fact symmetric. If \p\ < |u|, then we introduce r 
with |r| = |u| — IpI and we replace the equations x = uw and x = pq by: 

u = pr, 
q = rw. 

Overall, the number of doubly defined variables may have increased by at most 
2. The variables z and x are replaced by r and w. But, in each case, we have 
|r| < c\z\ and \w\ < \x\. Hence 1^1 decreased by at least ln(l/c) and 

so the net decrease in W{E) is at least 1. 

Hence, in all cases, we have decreased W{E) by at least 1 in 0(1) arithmetic 
operations. 

Remark 1. The method above yields a most general solution in the following 
sense. Let E be an instance to the problem of Theorem 0 and assume that E 
is solvable. Then we produce in linear time a quadratic system over a set of 
variables E (but without doubly defined variables) such that the set of solutions 
satisfying the length constraints is in a canonical one-to-one correspondence with 
the set of mappings V' : r — > A* where \'4'{x)\ = \bx\ for x G E. 

Corollary 1. If the eonjeeture (weak form) above is true, then the satisfiability 
problem for quadratic systems of word equations is NP-complete. 

Remark 2. Given Theorem 1, Corollary 0 follows also from a recent work of 
Rytter and Plandowski I2U- They have shown that if the lengths bx,x G El, 
are given in binary as part of the input together with a word equation (not 
necessarily quadratic), then there is a deterministic polynomial time algorithm 
for the satisfiability problem. Their method is based on Lempel-Ziv encodings 
and technically involved. Our contribution shows that the situation becomes 
much simpler for quadratic systems. In particular, we can reduce polynomial 
time to linear time; and our method is fairly straightforward using variable 
splitting. In view of the conjectures above it is not clear that the use of Lempel- 
Ziv encodings can improve the running time for deciding the satisfiability of 
quadratic systems. The most difficult part is apparently to get an idea of the 
lengths bx for x G El. Once these lengths are known (or fixed), the corresponding 
satisfiability problem for quadratic systems of word equation becomes extremely 
simple. 

4 Regular Constraints 

There is an interesting generalization of Makanin’s result. The generalization is 
due to Schulz E2] and says that if a word equation is given with a list of regular 
languages Lx C A*,x G El, then one can decide whether there is a solution 
a : El — !• A* such that cr(a;) G Lx for all x G El. In the following we shall assume 
that regular languages are specified by non-deterministic finite automata (NFA) . 
In the uniform version the NFA are part of the input. In the non-uniform version 
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the NFA are restricted such that each is allowed to have at most k states, where k 
is a fixed constant being not part of the input. Using a recent result of Gutierrez 
0 one can show that the uniform satisfiability problem of word equations with 
regular constraints can be solved in EXPSPACE (more precisely in DSPACE 
( 2 o(d ^ denotes the input size), see So, from the general case it is not 

really clear whether adding regular constraints makes the satisfiability problem 
of word equations harder. We give here however some evidence that, indeed, it 
does. Restricted to quadratic systems the uniform satisfiability problem with 
regular constraints becomes PSPACE complete. 

The non-uniform version is NP-hard and it remains NP-hard, even if the lengths 
bx,x € f2, are given in unary as part of the input. This is in sharp contrast to 
Theorem|3 Having regular constraints it is also easy to find examples where the 
length of a minimal solution increases exponentially; the next example is of this 
kind. Note however that this refers to the uniform version of the problem 

Example 1. (modified from one by (3) Let n > 0. Consider the following word 
equation with regular constraints: 

A = {a, b}, f2 = {xi I 0 < z < n}, 

Lxi = o, for z < 1, 

Lx, = aA*a\{A*EA*) for i > 1, 

Xnb^Xn = Xobxo b'^Xibxi ■ ■ ■ b^ Xn-\b^~^ Xn-1 

Theorem 3. The following problem is PS PACE-complete. 

INSTANCE: A quadratic system of word equations with a list of regular con- 
straints Lx C A* , X € 12. 

QUESTION: Is there a solution a: 12 — > A* such that a{x) G Lx for all x G f2? 
Moreover, the problem remains PSPACE-complete, if the input is given together 
with a list of numbers bx, x G 12 (a number 6 G N resp.), written in binary, 
and if we ask for a solution satisfying in addition the requirement |cr(a:)| = bx 
(\cr{x)\ = b resp.) for all x G 12? 

Proof. The PSPACE-hardness follows directly from a well-known result on regu- 
lar sets. Let Li, . . . be regular languages specified by NFA. Then the emptiness 
problem Li n • • • n = 0 is PSPACE-complete, El. If the intersection is not 
empty, then there is a witness of at most exponential length. Let b be this upper 
bound on the length of a witness. Using a new letter c such that c ^ A, we can 
ask whether the intersection 



Lie* n • • • n L„c* 

contains a word of length b. (Instead of using a new letter we may also use 
some coding provided |A| > 2.) The quadratic system is given by n variables 
x\, . . . ,Xn and regular constraints Lx, = L^c* for 1 < z < n. The equations are 
trivial: xi = X 2 , X 2 = X 3 , ■ ■ ■ , Xn-i = Xn. 

The PSPACE algorithm for the uniform satisfiability problem is a modification 
of the proof of Theorem |21 
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Theorem 4. Let r > A he a fixed constant, which is not part of the input. The 
following problem is NP-complete. 

INSTANCE: A quadratic system of word equations with a list of natural numbers 
bx € N written in binary, a list of regular constraints C A* , x € f2, such that 
each language can be specified by some NFA of at most r states, and |A| > 2. 
QUESTION: Is there a solution a: fl — s- A* such that |cr(a;)| = b^ and cr(x) € 
for all X & 12? 

Moreover, the problem remains NP-hard, if the numbers b^, x £ fl, are written 
in unary, \A\ = 2, and the system is a single equation. 

5 Conclusion 

Problems of satisfiability of quadratic word equations, with or without regular 
constraints, are apparently simple subcases of general problems known to be 
decidable. However there are a number of interesting questions still open. In 
the three cases studied (no constraints, uniform constraints and non-uniform 
constraints), we have only hardness results with no close upper bounds for the 
general problem where no information is given on the lengths of the solution. 
It would be very interesting to find a proof (or disproof!) of the conjectures of 
Section 0 on the minimal solution length. 
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Abstract. This paper deals with decision problems related to the star 
problem in trace monoids, which means to determine whether the itera- 
tion of a recognizable trace language is recognizable. Due to a theorem 
by Richomme from 1994 |1 S|. we know that the star problem is decid- 
able in trace monoids which do not contain a C4-submonoid. It is not 
known whether the star problem is decidable in C4. In this paper, we 
show undecidability of some related problems: Assume a trace monoid 
which contains a C4. Then, it is undecidable whether for two given recog- 
nizable languages K and L, we have K ^ L* , although we can decide 
K* C L. Further, we can not decide recognizability of K Ci L* as well as 
universality and recognizability of K U L* . 



1 Introduction 

Free partially commutative monoids, also called trace monoids, were introduced 
by Cartier and Foata in 1969 [2|. In 1977, Mazurkiewicz proposed these 
monoids as a potential model for concurrent processes m, which marks the 
beginning of a systematic study of trace monoids by mathematicians and theo- 
retical computer scientists, see e.g., 06C|. 

One main stream in trace theory is the study of recognizable trace languages, 
which can be considered as an extension of the well studied concept of regular 
languages in free monoids. A major step in this research is Ochmanski’s PhD 
thesis from 1984 m Some of the results concerning regular languages in free 
monoids can be generalized to recognizable languages in trace monoids. However, 
there is one major difference: The iteration of a recognizable trace language does 
not necessarily yield a recognizable language. This fact raises the so called star 
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problem: Given a recognizable language L, is L* recognizable? In general, it is 
not known whether the star problem is decidable. The main result after a stream 
of publications dealing with this problem is a theorem stated by Richomme in 
1994, saying that the star problem is decidable in trace monoids which do not 
contain a particular submonoid called C4 m- It is not known whether the star 
problem is decidable in trace monoids with a C4-submonoid. It is even unknown 
for finite trace languages. 

In this paper, we consider some decision problems for recognizable trace 
languages which are related to the star problem. If we have two recognizable 
languages K and L in a trace monoid with a C4-submonoid, then it is unde- 
cidable whether K is a subset of L* and whether K U L* yields the complete 
monoid. Further, recognizability of K U L* and K O L* is undecidable. 

The paper is organized as follows. After this introduction, I explain some 
concepts from algebra, formal language theory, and trace theory. We deal with 
recognizable sets, rational sets, and relations between them. Then, we discuss 
some decision problems concerning recognizable and rational trace languages 
and their solutions as far as known. 

In Section^ we establish a method to define two recognizable trace languages 
P and P from a given instance of Post’s Correspondence Problem. We examine 
properties of P, P, and P*. In Section 4, we use these properties to develop the 
main results. In Section 0 we compare the new results to known results. 

2 Formal Definitions 

2.1 Monoids, Languages, and Traces 

I briefly introduce basic notions from algebra and trace theory. By IN, we denote 
the set of natural numbers including zero. Assume two monoids IMi and IM 2 . 
We denote their Cartesian Product by Mi x M 2 . We denote its elements by 
For p G Ml and M C M 2 , we denote by (^) the set of all pairs for q G M. 

By an alphabet, we mean a finite set of symbols called letters. Assume an 
alphabet E. We denote the free monoid over S by S* . We denote the empty 
word by A. For every word w G S* , we call the number of letters of w the length 
of w, and denote it by |u>|. For every n G IN, we denote by A-” the set of words 
w G E* with |w| < n. Accordingly, we use the notions A<", A-", and 

We call a binary relation I over E an independence relation iff / is irreffexive 
and symmetric. For every pair of letters a and b with alb, we say that a and 
b are independent, otherwise a and b are dependent. We call the pair (A, I) 
an independence alphabet. We call two words W\,W 2 G A* equivalent iff we 
can transform w\ into by exchanging independent adjacent letters which we 
denote by w\ W 2 . For instance, if a and c are independent letters, baacbac, 
bacabac and bcaabca are mutually equivalent words. 

The relation is a congruence. For every w G A*, we denote by [w]/ the 
equivalence class of w. Therefore, we can define a monoid with the sets [w]i as 
elements. For w\,W 2 G A*, we define the product of [wi]/ and [^ 2 ]/ by [wiW 2 ]i. 
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We denote this monoid by IM(Z',/) and call it the trace monoid over S and I. 
We call its elements, i.e., the equivalence classes [tc]/, traces and its subsets trace 
languages or shortly languages. The function []/ is a homomorphism from the 
free monoid S* to /). As long as no confusion arises, we omit the index / 

at []/. 

If / is the empty relation over S, then the trace monoid M(i7, 1) is the free 
monoid S*. If I is the biggest irreflexive relation over E, i.e., two letters a and 
b are independent iff a and b are different, then the trace monoid IM(A', I) is the 
free commutative monoid over S. Opposed to this very brief introduction, we 
formally define P3 and C4. 

Lemma 1. Assume two disjoint alphabets Ei and and assume the inde- 
pendence relation I = E\y. S 2 U E 2 y.E\. The trace monoid IM( Ai U A 2 , /) is 
isomorphic to the monoid E* x I7|. An isomorphism maps every letter a G Ei 
to (^) , and every letter b G E 2 to (^) . □ 

This lemma is an application of a method by Fliess to transform arbitrary trace 
monoids into (sub)monoids of Cartesian Products of free monoids, see Chapter 1 
in (Z). Iff one of the alphabets Ei and E 2 is a doubleton, and the other one is 
a singleton, we denote by P3 both the monoid E* x and the independence 
alphabet {Ei U 172 , /) with I from Lemmad Iff both alphabets are doubletons, 
we accordingly use the notion C4. The notions P3 and C4 abbreviate path of 3 
letters and cycle of j letters, respectively. 

Assume an independence alphabet {E,I). A trace t G M(i7,/) is called 
connected iff for every non-empty traces ti and t 2 with t = tit 2 , there are a 
letter a in ti and a letter b in t 2 such that a and b are dependent. A trace (“) in 
P3 or C4 is connected iff u = A or u = A. A trace language L is called connected 
iff every trace in L is connected. 

2.2 Recognizable Sets 

I introduce the concept of recognizability as far as we use it in this paper, for a 
more general overview I recommend HE]. 

Definition 1. Assume a monoid IM. An M-automaton is a triple A = [Q, h, F], 
where Q is a finite monoid, h is a homomorphism ft, : IM — > Q, and F is a subset 
of Q. The language of an TM-automaton A is defined by L(A) = h~^{F). □ 

If L{A) = L, then we say is an M-automaton for L. We call a set L C M a 
recognizable language over M iff there is an M-automaton for L. We denote 
the class of all recognizable languages over M by REC(M). Calling the triple 
[Q,h,F] an M-automaton is due to Courcelle d- The next theorem is a 
classic one, you find the proof in, e.g., m- 

Theorem 1. Assume a monoid M. The class REC(M) contains the empty set, 
M itself, and it is closed under union, intersection, complement and inverse 
homomorphisms. □ 
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Theorem 2. Assume a trace monoid IM(I7, /). The class RECIM(T',/) con- 
tains all finite subsets of IM(T', I) and is closed under monoid product and iter- 
ation of connected trace languages. □ 



The proof of the closure under monoid product originates from Fliess. Closure 
under iteration of connected trace languages is due to Ochmanski, Clerbout 
and Latteux, and Metivier fl7l3ll5| . Chapter 6 of [ 7 ] is a recent survey on 
recognizable trace languages including a convenient proof of Theorem |2| 

If a trace monoid IM(i7,/) is a free monoid, then every trace is connected. 
Thus, Theorem |2| includes the classic result that recognizable languages in free 
monoids are closed under iteration. 

The following result is widely known as Mezei’s theorem m 



Theorem 3. Assume two monoids IM, IM^ A set L is recognizable in IMxIM^ iff 
there are an n G IN, sets L\, . . . ,Ln £ REC(M), and sets L[, . . . G REC(M') 
such that L = (Li x L'^) U . . . U (L„ x L^). □ 



Let us shortly mention the notion of rational sets. Assume some monoid M. 
The set of rational expressions REX(M) is the smallest set which contains 
the symbol 0, the elements in M and is closed as follows: For the expressions 
ri,r 2 G REX(M), the expressions r*, (ri Ur 2 ), and (rir 2 ) belong to REX(M). 
Every rational expression r defines a language L(r) as usual. 

We have Kleene’s classic result which asserts that in free monoids the recog- 
nizable sets and the rational sets coincide m In trace monoids, we have just 
one direction due to a more general result by McKnight Every recogniz- 
able trace language is rational. Moreover, we can transform every automaton 
into a rational expression which defines the same language. However, there are 
rational trace languages which are not recognizable unless the underlying trace 
monoid is a free monoid. See Chapter 5 in 0 for more information on rational 
trace languages. We continue with a well-known example. 



Example 1. We consider the alphabets E = {a, b}, Ei = {a}, S 2 = {6} and the 
monoids E* and E* x . We define the language L in E* by the rational expres- 
sion (o5)*. Hence, L is rational and recognizable. We apply the homomorphism 



[] on L, we get [L] = | (j„) n G In|. We show that [L] is not recognizable 



by the closure properties of recognizable sets. By applying the inverse homo- 
morphism on [L], we get = {w \ |r<;|a = Iwlb}- This language is 

not recognizable. If we assume that is recognizable, its intersection with 

the recognizable language defined by a*b* would also be recognizable. But, this 
intersection yields {a" 6" \n G IN}, which is not recognizable. 

However, [L] is rational, because it is the iteration of a singleton. □ 



2.3 Some Decision Problems for Trace Languages 

The following decision problems arise: 

Recognizability Problem: Can we decide whether the language of a rational 
expression is a recognizable language? 
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Star Problem: Can we decide whether the iteration of a recognizable language 
yields a recognizable language? 

In 1987 and 1992, Sakarovitch proved the following theorem IM]. 

Theorem 4. Assume a trace monoid The following three assertions 

are equivalent: 

— (S,I) does not contain an TS-subalphabet. 

— The rational languages of M(T',/) form an (effective) Boolean algebra. 

— We can decide whether the language of a rational expression yields a recog- 
nizable language. □ 

During the recent 15 years, many papers have dealt with the star problem. Only 
partial results have been achieved. I give just a brief survey about their history. 
The star problem in the free monoid is trivial due to Kleene, and it is decidable 
in free commutative monoids due to Ginsburg and Spanier mam- In 1984, 
OCHMANSKI examined recognizable trace languages in his PhD thesis and stated 
the star problem. During the 80’s, Ochmanski, Clerbout and Latteux, and 
Metivier independently proved that the iteration of a connected recognizable 
trace language is recognizable nmm- Sakarovitch’s solution of the recog- 
nizability problem in 1992 (cf. Theorem 0| above) implies the decidability of 
the star problem in trace monoids which do not contain a P3-submonoid. The 
attempt to extend Sakarovitch’s characterization to the star problem failed, 
just in the same year, Gastin, Ochmanski, Petit and Rozoy showed the de- 
cidability of the star problem in P3 |0|. During the subsequent years, Metivier 
and Richomme showed beside other results the decidability of the star problem 
for languages containing at most four traces as well as for finite sets containing 
at most two connected traces m- In 1994, Richomme proved the following 
theorem m 

Theorem 5. The star problem is decidable in trace monoids which do not con- 
tain a GA-submonoid. □ 

The star problem in trace monoids with a C4-submonoid remains open. 



3 A Tricky Language 

In this section, we show a method to derive two recognizable trace languages 
from a given instance of Post’s Correspondence Problem (PGP). We examine 
how properties of the iteration of one of the defined languages depend on the 
existence or non-existence of a solution of the underlying PGP instance. 

An instance of the PGP consists of two alphabets T and A, and two homo- 
morphisms a, P : T* ^ E*. We call the letters of T indices. It is well known that 
it is undecidable whether a given instance of a PGP has a solution, i.e., whether 
there is some word w G T'*' with a(w) = P{w). We assume that for i € T, 
we have a{i) yf A and /3(z) yf A. This restriction of the PGP is also undecidable. 
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3.1 Definition of IR and IP 

We define the languages IR and P. We assume an instance of the PCP, consisting 
of T, S, a, and (3. We call it the underlying PCP instance. We denote the number 
of letters of T by A: and treat T as {ii, . . . , ife}. 

We enrich the alphabet T by nine letters, we set P = {ii, . . . , ife, ai, . . . , ag}, 
while we assume ai , . . . , ag ^ S . For m, n with l<m<n<9, we abbreviate the 
word a„a„+i . . .&m by a„..m, e.g., we write as-s instead of a 3 a 4 a 5 . 

We need a function j : T* ^ P* to “code” words in T*. We set 7 (A) = ai..g. 
For w € T* and i G T, we set j{wi) = 7 (w)zai..g. For instance, we have 
7 ( 1612 ) = ai-.g ie ai-.g ig ai-.g. Obviously, 7 is not a homomorphism. 

Definition 2. The language P C (F* x A7*) is defined by tR = 7 (T’’'') x S* . □ 

We denote the complement of P by P, i.e., P = {P* x if*) \ P. Consequently, 
P yields the language {P* \ 7 (F~'') x A7*. I recommend to read the following 
definition just briefly, now, and to study the details when we apply it. 



Definition 3. The language P C [P* x Af*) is defined as the union of the sets: 



IPI.I 


— UigT 


/ai..gzai N 

VA:<i“b)i/ 


Pi , 2 


= UiGT ( 


ag..g z ai \ 
E<\a(^)\ ) 


IPl,3 = 


{(T)} 


P 24 


= 




P 2,2 


= UiGT ( 


a 3 ..g Z ai ..2 
A7l“b)l 


) IP2,5 = 


{(T)} 


IP2,3 


= UiGT 


r a 3 ..gzai ..3 

\{a(i)} 


) 


P2,4 


= Ujgt ( 


a4..g i ai..3 \ 

A;l“b)l / 




IP3,1 


= UiGT 


/ai..g i ai.. 4 \ 
1 A7>l“(dl / 


IP3,2 


= UiGT ( 


as..g z ai..4 
A7l“b)l 


^ IP3.3 = 


{(T)} 


P4,l 


= UiGT 


/ ai..g i ai..5 \ 

1 E<\d(^\ ) 


P4,2 


= UiGT ( 


^6-9 ^ ^l--5 

E<\dii)\ 


^ IP4,3 = 


{(T)} 


IPs,! 


= 


')} 


IPs ,2 


= UiGT ( 


a7..g i ai-e 

smi)\ 


^ IPs.s = 


{(‘r)} 


IPs,3 


= UiGT 


/ a7..gzai..7 

lrl/5WI \{/3(i)} 


) 


IPs,4 


= Ujgt ( 


as..gzai..7\ 

sm)\ ) 




IPe,! 


= UiGT 


/ ai..g i ai.. 8 \ 

1 r>i/5Wi ) 


IPe ,2 


= UiGT ( 


ag z ai.. 8 \ 

, sm)\ ) 


IPe,3 = 


{(?)} ° 



This is a neat little rip. We remark that P 17 , . . . , Pe ,3 are mutually disjoint. 

By using the letters ai,...,ag, we control the concatenation of the traces 
in P. We can consider two kinds of traces in P*: “well-formed” traces, i.e., 
traces in P, and “trash” traces. To examine the intersection P n P*, we just 
have to examine the “well- formed” traces in P*. Thereby, we are able to show 
connections between the existence of a solution of the underlying PCP instance 
and properties of P C P* . 
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3.2 Properties of IR, IP, and IP* 



The first important property of IR, IR and P is recognizability. We can effec- 
tively construct {F* x T’*)-automata for IR, IR and P from the underlying PCP- 
instance. See m for details. We examine P*. We are mainly interested in traces 
in P* whose first component is a word from 7 (T'*'). 



Lemma 2. For every w S , we have assertions (1) and (2). If w is not a 
solution of the underlying PCP instance, we further have assertion (3). 




7(u>) \ 

S* \ {a{w)}/ 



C P* 




j(w) \ 



C P* 




At this point, we somehow firmly feel that something very unpleasant will happen 
in the case that w is a solution of the underlying PCP instance. 

Proof: We show (1). We assume some w G T+ and some u yf a(w) from F*. 
We have to show G P*. We branch into three cases, depending on whether 

|u| < |a(w)|, |u| = |q;(w)|, or |m| > |a(r(;)|. We treat w = j\ . . .jn for some n > 1 
and some ji, . . . , G T. Consequently, a{w) = a{ji ) . . . a(jn)- 



— Case 1: |u| < |a;(w)| 

We factorize u into u\, . . . ,Un such that |ui | < |a(ji)|, and for I G {2, . . . ,n}, 
we have |u/| < \a{ji)\. At this point, we need the assumption 0 !(ji) A. 
We define t\, t„+i, and ti for I G {2,. . . , n}: 



— 



G Pi.: 



tn+l — 



(‘r) 



GPi 



.3 



\ Ui / ’ \ Ul 

It is a straightforward verification that ti . . .tn+i yields 
— Case 2: |u| = |a(w)| 

We factorize u into ui,...,Un such that \ui\ = |o:(ji)| for I G {!,..., n}. 
Because u y^ a{w), there is some z G {1, . . . , n} with Uz yf a{jz)- We define 
the following traces for Z S {1, . . . , z — 1} and m G {z -|- 1, . . . , n}: 

^ _^ai..2^ _^as..9 ji ai..2^ ^ _^as..9 ai..sj ^ _^a4..g ai..3^ ^ _^a4..9^ 

They belong to P 2 ,i, ■ ■ , P 2 , 5 , respectively. Their concatenation yields 



We can show the remaining case |u| > |a(w)| as Case 1 using traces in P 37 , 
P 3 ^ 3 , and P 3 , 3 , instead. We can prove assertion (2) in the same way using traces 
in P 41 , . . . , Pe_ 3 . If w is not a solution of the underlying PCP instance, i.e., 
if a{w) yf l3{w), then (1) and (2) imply (3). See jT3| for details. □ 



Corollary 1. If the PCP instance has no solution, we have P C P*. □ 



This corollary is an obvious conclusion from Lemma El We need some kind of 
opposite to Lemma El and Corollary [D We show that some traces in P do not 
belong to P* if the underlying PCP has a solution. Together with Corollary [Q 
we obtain a strong tool, which will allow us to proceed straightforward proofs of 
the main goals of this paper. 
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Lemma 3. Assume some w S T+ with a(w) = j3(w). We have ^ P*. □ 

Proof: (sketch) We assume some word w G T+ with a{w) = P{w) such that 
(I(u))) ^ show a contradiction. Thus, we assume an integer n > 1 and 

traces ti, . . . G P such that ti . . The first component of ti is 

either A, or a word starting with ai. Consequently, we have ti G Pi,i, P2,i) 
Ps,!) P4,ij Ps.i) or Pe.i, i.e., we have to branch into six cases. 

Assume ti G Pi,i. We want to determine traces t2,---,tn G P such that 
t\ . . .tn = • To obtain the word '){w) as first component, we are forced 

to select traces t2, ■ ■ ■ ,tn-i from Pi^2 and to finish with tn = € Pi,3- 

Thus, we have ti . . . G P^ ^P* 21*1 3. Then, we cannot achieve that the second 
component of ti . . .tn yields a{w). We defined the sets Pip and Pip in a way 
that the second component of ti . . . t„ is properly shorter than a{w). 

Let us try to build from traces ti, . . . , t„ G P by starting with a trace 

ti G P2.1- Then, we can show ti . . . t„ G P2 1P2 2^*2 31*2 4^*2 5- Then, the length 
of the second component of ti . . . t^ is exactly the length of a(w). However, there 
is one trace from P2,3 among ti, . . . ,tn- This trace causes an error, i.e., because 
of this trace, the second component oi t\ . . .tn is different from a{w). 

The attempt to start with a trace ti G Psp, P4p, Psp, or Pep fails in a 
similar way. Note that a{w) = /3{w). See ini for a presentation of this proof in 
a more formal way. □ 



4 Main Results 



Now, we are able to prove the following theorem. 
Theorem 6. The following four assertions are equivalent: 



(1) The underluinq PCP instance has no solution. 

( 2 ) PCP* 

(3) P U P* is recognizable. 

(4) P n P* is recognizable. □ 



Proof: We have (1)^(2) by Corollary^ To show (2)^(1), assume a solution w 
of the PCP instance. Then, ^ P* by Lemma0 but G P by Def. El 

We have (2)— >(4), because P n P* yields P which is recognizable. We have 
(2)^(3), because P U P* yields the monoid P* x S* which is recognizable. 

We show (4)^(1). Assume a T* x A7*-automaton A = [Q, h, T’] for P n P*. 
Assume a solution w of the PCP instance. For n > 1, the words w” are also 



solutions. Because Q is finite, there are m > n > 1 with ^). 

Then, we have h(l^^l]) = 

Hence, either both or none of the traces and belong to PnP*. 

But, on one hand, ^ P* by Lemma El On the other hand, we have 
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(a[w'‘j) ^ (a[Z’‘j) ^ Lemma 131 ). We can show (3)— *■(!) in the 

same way inj. □ 

We deduce the main result of this paper. 

Theorem 7. Assume an independence alphabet {E, I) which contains a C4- 
subalphabet. There is no algorithm, whose input are two , I) -automata for 
languages K and L in IM(L',/) which decides one of the following properties: 

(1) K CL* 

(2) KUL* =M{E,I) 

(3) K\JL* & RECM(L',/) 

({) KCL* € RECM(r,/) □ 

Proof: (sketch) At first, we note that there is no algorithm whose input are two 
alphabets Ei and E 2 , and further, two recognizable languages K, L C E^ x E 2 , 
which decides (1), (3), or (4). By usual coding techniques, we obtain the same 
result for C4. Then, the generalization to trace monoids with C4-submonoids is 
obvious. (2) is an immediate consequence of (1). See for details. □ 

5 Conclusions and Future Goals 

Let us discuss about Theorem 0 Opposed to the undecidability of property (1), 
we can decide whether K* C L in every trace monoid. Given automata for K 
and L, we can construct a rational expression k for K, and check L(k*) C L. 

Opposed to the undecidability of property (2), we can trivially decide whether 
L* yields the complete trace monoid: we have simply to check whether every 
letter of E occurs as a one letter trace in L. Moreover, for recognizable languages 
L and M, we can decide whether L* = M |0j. 

To decide property (3) in Theorem 0 is a special case of the recognizability 
problem. The star problem is a special case of (3), namely for K = %. 

Due to classic results and Theorem0by Sakarovitch, properties (1) to (4) 
are decidable in trace monoids which do not contain a P3-submonoid 

Let us consider the decision problems in Theorem 0in arbitrary trace monoids 
for finite sets K. Then, (1) and (4) are obviously decidable. Richomme remarked 
that property (3) restricted to finite sets K is decidable iff the star problem is 
decidable. Richomme further showed that (2) is decidable for finite sets K as 
follows. Choose some n € IN such that for t G K, we have |t| < n. Then, K U L* 
yields the complete monoid iff every trace t with |f| < 2n belongs to K U L*. 

Recently, Marcinkowski and myself examined improvements of Theorem0 
e.g., restrictions to finite sets L and partial generalizations to P3 HSl- 
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Abstract. We present an approximation algorithm for the problem of 
partitioning the vertices of a weighted graph into p blocks of equal size 
so as to maximize the weight of the edges connecting different blocks. 
The algorithm is based on semidefinite programming and can in some 
sense be viewed as a generalization of the approximation algorithm by 
Frieze and Jerrum for the Max Bisection problem. Our algorithm, as 
opposed to that of Frieze and Jerrum, gives better performance than the 
naive randomized algorithm also for p > 2. 



1 Introduction 

The Max Cut problem takes as input an undirected graph G = (V, E) with 
non-negative edge weights. The objective is to find a cut, i.e., a partition of the 
vertices into two halves, so that the sum of the weights of the edges between the 
two halves is maximized. This is one of the most studied NP optimization prob- 
lems, and the corresponding decision problem was shown to be NP-complete 
by Karp [^. Interest has therefore turned to the design of approximation algo- 
rithms, and for a long time the best such was a 0.5-approximation algorithm, 
i.e., the weight of the cut output by the algorithm is at least 0.5 times the op- 
timum cut. This changed dramatically a few years ago with the seminal paper 
of Goemans and Williamson They showed how a semidefinite relaxation of 
the natural integer programming formulation could be combined with an ele- 
gant randomized rounding scheme to yield a 0.87856-approximation algorithm 
for Max Cut. Obviously this new technique attracted a lot of attention, and 
since then semidefinite programming has become a standard tool for construct- 
ing approximation algorithms. 

Frieze and Jerrum |3] extended the approach of Goemans and Williamson 
and applied it to two interesting generalizations of Max Cut: The Max p- 
CuT problem, where the vertices are to be partitioned into p parts instead of 
two, and the Max Bisection problem, where the vertices are to be partitioned 
into two halves of equal size. For the Max p-CUT problem they obtained a 
logp)^-approximation algorithm and for the Max Bisection 
problem a 0.651-approximation algorithm. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 237-E^D 1999. 
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In this paper we study the Max p-SECTiON problem, which is a generaliza- 
tion of the Max Bisection problem: The vertices are to be partitioned into 
p parts of equal size so as to maximize the weight of the edges connecting differ- 
ent parts. For this problem, the approach of Frieze and Jerrum does not improve 
on the obvious randomized algorithm: Select a p-section uniformly at random. It 
is easy to see that this gives a ^^-approximation algorithm as the probability 

of an edge being cut is Is there a way to improve on this simple algo- 

rithm? For some problems, e.g. the Max E3-Sat problem, it has been shown 
that there does not exist any approximation algorithm better than the naive ran- 
domized algorithm unless P = NP |S|. The main contribution of this paper is 

a -approximation algorithm for Max p-SECTiON, thus showing 

that the naive randomized algorithm is not the best possible for this problem. 
Our algorithm is based on semidefinite programming. It is easy to formulate but 
the analysis is non-trivial. 

Why is it harder to come up with an approximation algorithm for the Max p- 
Section problem than for the Max p-CuT problem? What complicates matters 
is the constraint that all parts of the partition must have equal size. Tradition- 
ally, approximation algorithms based on semidefinite programming have been 
analyzed by evaluating, analytically or numerically, the performance on local 
configurations (see e.g. the thorough work of Zwick on constraint satisfaction 
problems 0). For the Max p-Section problem, this technique of analysis must 
be amended with a global analysis to show that an even p-section is produced. 

2 Preliminaries 

Definition 1. The Max p-Section problem is that of finding a partition of the 
vertices of a graph G = (F, E) with weights Wij associated with the edges into p 
subsets of equal size so as to maximize the total weight of the edges cut by the 
partition. 

The special case p = 2 is called Max Bisection. 

We will denote by n the number of vertices in the graph. For a p-section to 
exist we must have p | n, so we assume that this is the case. 

Definition 2. An approximation algorithm A for an NP maximization problem 
M has performance guarantee r < 1 if A(I) > r ■ opt^(/) for all instances I 
of M. 

When this holds, we will refer to A as an r-approximation algorithm. 



3 The Main Algorithm 

The foundation of our approximation algorithm is a semidefinite relaxation of 
the Max p-Section problem. It is based on the relaxation used for the Max 
E2-Lin mod p problem by Andersson, Engebretsen and Hastad 0. To each 
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vertex Xi corresponds a set of p vectors which are the vertices of a 

regular (p — l)-simplex in i?" centered at the origin. Following 0, we will refer 
to this object as a simplicial porcupine. To the edge {xi, Xj) corresponds the term 
— ^3^ X)fc=o('^fc> '^fc) (omitting the weight Wij) in the objective function. If 
the simplices corresponding to all the vertices Xi shared vertices in i?" , we could 
solve the Max p-Section problem to optimality using this approach. Alas, 
this is not the case, and we therefore add inequalities which are valid for such 
a configuration and which simplify the analysis. The semidefinite program we 
construct is 



maximize 



p- 1 






V ^ fc=0 



subject to {vl, v\) = 1 for all i, k, 

{vl, vl,} = for all i and all k ^ k' , 

{vl, vli) > for all i ^ j and all k, k', 

for all i,i' and all j,j',k, 

'^iVl = 0 for all k. 



( 1 ) 



The first two constraints guarantee that {vj}^Zq is a simplicial porcupine, the 
third and fourth constraints make the solution more symmetric and therefore 
easier to analyze, and the last constraint encourages an even partition of the 
variables after the randomized rounding. For p = 2 the relaxation is equivalent 
to the relaxation used by Frieze and Jerrum P| for the Max Bisection problem. 

In the remainder of this paper we will analyze the following randomized 
algorithm for the Max p-SECTiON problem. 



Algorithm 1: Approximation algorithm for Max p-SECTiON. 

(1) Solve the above semidefinite program. 

(2) Generate r G i?" by choosing each component as 
A(0, 1) independently. For each vertex Xi and each 
j = 0, 1, ... ,p - 1, let qij = i + c(r,v^}. Now fix i. 
If all qij are in [0, 2/p], set pij = qij for all j, otherwise 
set Pij = 1/p for all j . 

(3) For each i, put Xi in part j of the partition with prob- 
ability p,j. 

(4) Balance the partition so that each part contains n/p 
vertices. This is described in detail in SectionElbelow. 



Note that step m is well-defined in the sense that J2jPij = 1 fo'' b this 
follows from the property Vj = 0 of simplicial porcupines. 

The value of the constant c depends on p; a precise expression for c will be 
given in Lemma El below. 

An interesting feature of the algorithm is that it involves three randomized 
passes: First r is chosen as a random vector in i?" in step m, then a preliminary 
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partition is constructed from independent coin tosses in step o, and finally the 
balancing scheme in step Q) also makes use of randomness. 

The running time of the algorithm is dominated by the time it takes to solve 
the semidefinite relaxation in step (P). 



4 Analyzing Local Confignrations 



In this section we will analyze the performance of the algorithm up to step 021); 
the adjustments made in step Q will be analyzed in Sectional 

The intuition behind the algorithm is as follows: If the porcupines 
and corresponding to the vertices Xi and Xj are almost perfectly mis- 
aligned, in the sense that is close to then the random variables pik 

and pjk will be negatively correlated and the probability that Xi and Xj will be 
put in the same part of the partition by step m will be less than 1/p. On the 
other hand, if the two porcupines are almost perfectly aligned, corresponding to 
the inner products being close to 1, the probability that the vertices will be put 
in the same part will be greater than 1/p. 

Consider the edge {xi,Xj). The contribution to the objective function from 
this edge is 




where we omit the weight Wij from now on as it does not affect the analysis. If 
we can bound the ratio between this contribution and the probability that Xi 
and Xj end up in different parts, we have a bound on the performance guarantee 
after step (( 2 |). We therefore set out to do just that. 

Denote with Xij{r) the probability that the edge (xi,Xj) is cut given the 
random vector r. Then the expected performance guarantee after step Q, which 
we will denote G, satisfies 



G > min 



E[A'„(r)] 






^ 0 ’ *^0 



)) 



( 3 ) 



where the minimum is taken over all possible configurations of the two porcupines 
{^fe}fc=o {^fc}fc=oi expectation is over the choice of r. This follows 

from (ug, Vq) = (v^, v//) for all k, which is a consequence of the fourth constraint 
in the semidefinite program (IH) . 

As the porcupines {v\}/Z^q and {u^j^Cg together span a space of dimension 
at most 2{p — 1), we will from now on assume that r € for the sake 

of convenience. Let Gij = {r : |(u^,r)| < 1/pc and |(u^,r)| < 1/pc for all k}. 
Note that 17^- is symmetric; r G Qij if and only if —r € f2ij. This will simplify 
the analysis later on. 

We can write 






1 



P-1 



if r G 
otherwise. 



( 4 ) 
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and we now turn to bounding E[Xjj(r)]. As the geometry of f^ij makes it hard 
to calculate an exact expression, we will settle for a lower bound. This suffices 
as we seek to bound the performance guarantee from below. 

Let Dc = {r : |r| < 1/pc}. Then Dc C 17^ and we obtain 



V / ^ Dc D F ^_Q 



p-1 



1 



p-1 



(27t)p-^ 
1 






— 1 



1 



7 



(27t)p 1 Js2p-3 Jo ^ P 2(p - 1) 



I ri/(pc)^ 

P-2)!7o ^ 



2P-i(p- 



— 1 pc^ 
p 2(p - 1) 



(wq,w^)s^^s^p 



(5) 



using symmetry and the formula for the surface area of the hypersphere, 



f dS= 1 . . 

ls2p-3 (p — 2)! 



(6) 



From elementary calculus we have 



"e-“/2rfu = -2e-“/2 



2™-'=to! 



k=0 



kl 



(7) 



which gives the lower bound 

P“2 „,fc , 



E[A,,(r)] > 



p-1 



P 



,-u/2 



E 

fc=0 



2^/c!J i/(pc)2 



0’ ^0/ 



-pc {Vo, 



p-1 „fc . 

^-u/2 



'E' 

fc=0 



2^/clJ l/(pc)2’ 

( 8 ) 



We want to estimate the ratio between this lower bound and the contribution to 
the objective function; — (i'o>i’o))- ^ feature of the lower bound on 

E[Aij(r)] is that the only parameter describing the geometric relation between 
the two porcupines and is the scalar product (fo,fo) which is 

also present in the objective function. 
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Lemma 1. 





r lA 1 


0 




r 1 


0 


P-1 

p 


-«/2’^ “ 
Z^2fcfc! 




-pc^Yo,vl) 


-«/2V 






L J 


l/(pc)2 




L fc=o J 


l/(pc)2 






is an increasing function in {vq,Vq) for {vq,vD G 

Proof. The numerator is the integral of a non-negative function over a subset of 
jj 2 (p-i) hence non-negative for all {vq,vD in the interval and, as a special 
case, for {vq,vD = 1. This means that the fraction above can be written as 



a - b{vl,vi) 

1 - (^^o.^o) 



with a> b> 0. This function is increasing for (tig, tg) G l] • 

By the lemma, we only have to consider the case (tg,tg) = "'^hen 

looking for a lower bound on the performance guarantee. This gives the lower 
bound 



1 2 / 1 2 1 
Pfll + _ e-i/2(pc)= ( Pjzl V ^ ^ V ^ 

p p— 1 \ (2p2c2)fe/c! p — 1 2-^ (2p2c^)^A:! 

( 11 ) 



Out first goal is to prove that G > -I- 0{p~^) for some constant k. To 

that end we need to choose c as a function on p. 

Let c = ajp^'^ . We make the following estimates: 



1 P-2 1 

p-1^ 1 



2 1 

pc 1 



p (2p^c2)^/c! p — 1 (2p2c2)^fc! 

k — 0 k — 0 



< { if 2«2 < 1/2 } 



< 



p-2 



a 



_2 / P 



< 



P- 1 2 / P A 

p {p — 2)\\2a^) ' p2(p — 1) (p — 1)! V2o;^y 

3 (_P_Y~‘^ < 3pP , 3eP 



< 



(p— 2)!\2a2/ (2a^)P ^p! •y/2^(2a2)p-2 

using Stirling’s formula. Thus 



^ ^ P- 1 , 1 Z',2 3 

G > 1 T I Q; — — -^= 

P 



0.5 logp— (p— 2) log(2o;^)— p/2o!^ 



( 12 ) 



( 13 ) 



The following lemma is now easily verified and summarizes this section. 

Lemma 2. For c = 0.2/p^'® the performance guarantee G of Algorithm^ when 
the cost of the balancing is neglected satisfies 



G > 




P 



1 

30^' 



(14) 
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5 Balancing the Partition 

The partition produced in step 0 of Algorithm Q is not necessarily an even 
p-section. In this section we will estimate the decrease in the objective function 
due to the cost of balancing and provide a balancing scheme. This concludes the 
analysis of the performance guarantee of Algorithm ^ 



5.1 The Distance to an Even p-Section 



There are two types of errors that might make the partition uneven: 

1. For a fixed j, 'YhiPij niay differ from n/p. 

2. The actual number of vertices placed in part j may differ from 'Y^^Pij- 

We start off by analyzing how balanced the partition can be expected to be. 
To that measure we let Zj — * fixed, step 0 of the algorithm sets 

Pij = IIp for all j if any ~ p + outside [0, 2/p]. The interval [0, 2/p] 

is symmetric around 1/p and all have symmetric distribution around 1/p, 
hence E[.Zj] = n/p. In order to estimate the cost for balancing we need to study 
the variance of Zj\ 



Vwc[Zj\ = E ^ pyppj- - E 



P^o 



= EE 



jR2(p-l)_o 



{v],r){v) ,r) 



P-FI72 



(27t) 



P-1 



dV 



(15) 



Use l('*^)u)l 7 |?’| (by the Cauchy-Schwartz inequality): 



Va.r[Zj] < XI E 



=>-kl72 



<cV 



' R2(.p-1)-Q.., {2tt)P 1 

„ P-H72 



dV 



p-i 






(16) 



' |r|>l/pc 



(27t)? 



fc =0 



2c^p^ 



as r S fiiii when Jr] < 1/pc. 

Letting c = ajp^'^ and again applying the same kind of estimates as in Sec. 0] 
we obtain 



Va.r[Zj] < 



^ ^2^P-3.51osp-p/2q;^-(p- 2) log 2a^ 

2V2t/ 



(17) 



It is easily verified that choosing a = 0.2 results in Var[Zj] < p for all p > 2. 
Chebyshev’s inequality can now be applied: 

Pr[]Z,-E[Z,]l> an] 



(18) 
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The unbalancedness due to the second kind of error is easily estimated. Let 
Wij be the indicator variable for the event “xi is put in part j by step of the 
algorithm”. Applying standard Chernoff bounds (see e.g. theorems A. 4 and A. 13 
of P) gives 



Pr 



IE 



W,, - E[Z,] 



> n' 



3/4 



< e 



— 2n 



1/2 






1/2 



(19) 



I 

The decrease in the objective function due to correcting this small distortion 
turns out to be negligible. 



5.2 A Balancing Scheme and Its Performance 

Suppose that the p parts of the partition have sizes sq, si, . . . , Sp_i after step (0 
of Algorithm P Consider the following simple balancing scheme: 



( 1 ) ^^0 

(2) For each i such that Si > n/p: 

(3) Choose T of size Sj — n/p randomly and uniformly 
from part i. 

(4) S^SUT 

(5) Remove T from part i. 

(6) For each i such that Si < n/p: 

(7) Choose T of size n/p — si randomly and uniformly 
from S. 

( 8 ) S^S-T 

(9) Add T to part i. 



How much will the objective value decrease due to balancing? Clearly the 
worst case is when none of the vertices being moved in step (0 of the algorithm 
are endpoints of any cut edges. Next we bound the cost of the balancing. 

Lemma 3. The expected decrease in the expected performance guarantee for 
part i in the partition is at most max 

Proof. Let Q be the number of cut edges with one endpoint in part i of the 
partition. Consider the number of such edges after the set S has been formed 
by repeated applications of step (0. The expected decrease compared to the 
number prior to the balancing algorithm being run is 

^^max{0,s.-n/p}^ (20) 

Si 

The lemma follows from this and the simple observation that at least a frac- 
tion of the edges are cut in the optimal p-section. 
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The analysis from the previous section gives us the tools we need to bound the 
expected decrease (over the choice of r) in the expected performance guarantee 
due to balancing. 

Theorem 1. A lnnrith,m,n\ has expected performance guarantee + for 

p > 3 when run with c = 0.2/p^ ®. 

Proof. If we let a = in m and combine this equation with m we obtain 
Pr[max{si — n/p} < p~^n + > 1 — p • (p“® + ' ). (21) 

Lemma|3can be applied when max{sj — n/p} is small; otherwise the decrease in 
the expected performance guarantee can be bounded from above by 1. Combining 
Lemma El with this analysis shows that the expected performance guarantee is 
at least 




V 



1 

3(p 



- p • (p ^ + 



-,-2n 



1/2 






np ® 

n/p — np~^ 



P 

p-1' 



( 22 ) 



For p > 3 and large enough n this clearly is + 0{p ^). 

We defer the analysis of the special case p = 2 (i.e., Max Bisection) to the 
next section. 

This shows that Algorithm [D beats the trivial randomized algorithm. 



Remark 1. This performance guarantee, ^^ + 6*(p ^), is somewhat weaker than 
the + 0(p“^logp) achieved by Frieze and Jerrum 0 for the Max p-CuT 
problem. It may be possible to sharpen the bound for Max p-SECTiON but it 
seems hard to reach + 0{p~^) using the approach taken in this paper. 



6 Modifications for Max Bisection 

One may wonder if Algorithm Q improves on the algorithm of Frieze and Jerrum 
when p = 2. It turns out that a modified rounding scheme in step O improves 
the performance of Algorithm EJ 



(O) Generate r G RP by choosing each component as 
A^(0, 1) independently. For each vertex Xi and each 
j = 0, 1, ... ,p - 1, let qij = ^ + c(r, v)). Now fix i. 
If all qij are in [0, 1], set pij = qij for all j, otherwise set 
Pij = 0 if qij < 0 and pij = 1 if qij > 1. 



For p > 2 this might lead to J2j Pij 1 for some i, but for p = 2 it is a 
generalization of Frieze and Jerrum’s algorithm: 
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Theorem 2. Frieze and Jerrum’s algorithm for Max Bisection has the same 
worst-case behavior as the modified version of Algorithm^ with c= oo. 

Proof sketch. The semidefinite program o is, when p = 2, easily seen to be 
equivalent to the one used by Frieze and Jerrum. Their rounding scheme cor- 
responds to the limit c ^ oo in the modified algorithm above. The greedy 
balancing scheme they use is equivalent to our probabilistic scheme in the worst 
case. 

Numerical simulations indicate that c = oo is the best choice in Algorithm E 
so our approach does not provide a better approximation algorithm for Max 
Bisection. 

7 Open Problems 

Lower bounds 

There are no strong lower bounds on the approximability of the Max p- 
Section problem, probably because the most successful technique for prov- 
ing lower bounds, probabilistically checkable proofs (PCP), focuses on local 
properties of problems. For the Max Cut problem, it has been shown that 
there cannot exist any approximation algorithm with performance guarantee 
better than 16/17 unless P = NP [61/] . PCPs have been less useful on prob- 
lems with global constraints than on pure constraint satisfaction problems. 
Better algorithms for small p 

In practice, special cases of Max p- S ection where p is small — especially 
Max Bisection — are the most important ones. Is it possible to beat the 
0.651-approximation algorithm for Max Bisection? 
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Abstract. We examine the bandwidth problem in circular-arc graphs, 
chordal graphs with a bounded number of leaves in the clique tree, and 
k-polygon graphs (fixed k). We show that all of these graph classes admit 
efficient approximation algorithms which are based on exact or approxi- 
mate bandwidth layouts of related interval graphs. Specifically, we obtain 
a bandwidth approximation algorithm for circular-arc graphs that exe- 
cutes in 0(nlog^ n) time and has performance ratio 2, which is the best 
possible performance ratio of any polynomial time bandwidth approx- 
imation algorithm for circular-arc graphs. For chordal graphs with not 
more than k leaves in the clique tree, we obtain a performance ratio of 
2k in 0(k(n -h m)) time, and our algorithm for k-polygon graphs has 
performance ratio 2k^ and runs in time O(n^). 



1 Introduction 

A layout of a graph G = (V, E) is an assignment of distinct integers from (1 , . . . , n} 
to the elements of V . Equivalently, a layout L may be thought of as an ordering 
L(1 ), L(2), . . . , L(n) of V, where |V| = n. We shall use <l to denote the ordering 
of the elements in a layout L. The width of a layout L, b(G, L), is the maximum 
over all edges {u,v} of G of |L(u) — L(v)|. That is, it is the length of the longest 
edge in the layout. The bandwidth of G, bw(G), is the minimum width over all 
layouts. A bandwidth layout for graph G is a layout satisfying b(G,L) = bw(G). 

The problem of finding the bandwidth of a graph has applications in sparse 
matrix computations. An overview of the bandwidth problem is given in |S|. The 
minimum bandwidth decision problem (Given a graph G = (V, E) and integer k, 
is bw(G) < k?) is known to be NP-complete even for trees having maximum 
degree 3 H2|, caterpillars with hairs of length at most 3 jzm and cobipartite 
graphs mi- The problem is polynomially solvable for caterpillars with hairs of 
length 1 and 2 | 2 ], cographs HSI, and interval graphs 

To date there was not much known about the approximation hardness of the 
bandwidth minimization problem for graphs in general. Recently Feige presented 
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an approximation algorithm with performance ratio O(log^^^n) |l ij. Very re- 
cently Unger has shown in m that assuming Py^NP, there is no polynomial 
time approximation algorithm with constant performance ratio for the band- 
width minimization problem for graphs, even when the inputs are restricted to 
a special class of trees known as caterpillars of maximum degree three. 

Since the bandwidth minimization problem remains NP-complete for such 
simple classes of graphs, and since no polynomial time algorithm for approxi- 
mating the bandwidth of general graphs, or even trees, to within a constant factor 
exists unless P=NP, it is worthwhile to investigate approximation algorithms for 
this problem on restricted classes of graphs. Some results in this direction have 
been presented in Ea 

In this paper, we examine the bandwidth problem in circular-arc graphs, 
chordal graphs with a bounded number of leaves in the clique tree, and k-polygon 
graphs (fixed k). All of these graph classes admit efficient approximation algo- 
rithms which are based on exact or approximate bandwidth layouts of related 
interval graphs. Specifically, we obtain a bandwidth approximation algorithm 
for circular-arc graphs that has performance ratio 2 and executes in 0(nlog^ n) 
time, or performance ratio 4 while taking 0(n) time. For chordal graphs with 
not more than k leaves in the clique tree, we obtain a performance ratio of 2k 
in 0(k(n -|- m)) time, and our algorithm for k-polygon graphs has performance 
ratio 2k^ and runs in time O(n^). 

Finally it is worth mentioning that our approximation algorithm with perfor- 
mance ratio 2 for circular-arc graphs has optimal performance ratio, since there 
is no polynomial time bandwidth approximation algorithm for (unit) circular-arc 
graphs with performance ratio 2 — e for any e > 0 unless P=NP m 

2 Preliminaries 

For G = (V, E), we will denote |V| as n and |E| as m. We sometimes refer to the 
vertex set of G as V(G) and the edge set as E(G). We let N(v) denote the set 
of vertices adjacent to v. The degree of a vertex v, degree[v), is the number of 
vertices adjacent to v. A(G) denotes the maximum degree of a vertex in graph 
G. The subgraph of G = (V, E) induced by V' C V will be referred to as G[V']. 

The following well-known lower bound on the bandwidth of a graph is given 
in 0. 

Lemma 1. [The degree bound] Q/ For any graph G, bw(G) > A(G)/2. 

The distance in graph G = (V, E) between two vertices u,v G V, dclu.v), 
is the length of a shortest path between u and v in G. For any graph G = 
(V, E), the dth power of G, G'^, is the graph with vertex set V and edge set 
{(u,v}|dG(u,v) < d}. 

Lemma 2. [The distance bound] \r/^ (also attributed in part to ^ in m) Let G 
and H be graphs with the same vertex set V, such that E(G) C E(H) C EjG^i) or 
E(H) CE(G) C EjEl*^) for an integer d > 1, and let E be an optimal layout for 
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H, i.t., b(H,L) = bw(H). Then L approximates the bandwidth of G by a factor 
of d, i.e., b(G,L) < d-bw(G). 

Many references, including mi, contain comprehensive overviews of the many 
known structural and algorithmic properties of interval graphs. 

Definition 1. A graph G = (V, E) is an interval graph if there is a one-to-one 
correspondence between V and a set of intervals of the real line such that, for all 
u, V € V, {u,v} G E i/ and only if the intervals corresponding to u and v have a 
nonempty intersection. 

A set of intervals whose intersection graph is G is termed an interval model 
for G. Many algorithms exist which, given a graph G = (V, E), determine whether 
or not G is an interval graph and, if so, construct an interval model for it, in 
0(n + m) time (see, for example, [418) 1. We assume that an interval model is 
given by a left endpoint and a right endpoint for each interval, namely, left(v) and 
right (v) for all v S V. Furthermore, we assume that we are also given a sorted 
list of the endpoints, and that the endpoints are distinct. We will sometimes 
blur the distinction between an interval and its corresponding vertex, when no 
confusion can arise. 

Polynomial time algorithms for computing the exact bandwidth of an inter- 
val graph have been given in For an interval graph with n vertices, 

Kleitman and Vohra’s algorithm solves the decision problem (bw(G) < k?) in 
0(nk) time and can be used to produce a bandwidth layout in 0(n^ logn) time, 
and Sprague has shown how to implement Kleitman and Vohra’s algorithm to 
answer the decision problem in O(nlogn) time and thus produce a bandwidth 
layout in 0(nlog^ n) time. 

The following two lemmas demonstrate that, for interval graph G, a layout 
L with b(G, L) < 2 • bw(G) can be obtained in time 0(n), assuming the sorted 
interval endpoints are given. 

Lemma 3. Given an interval graph G, the layout L consisting of vertices ordered 
by right endpoints of corresponding intervals has b(G, L) < 2 • bw(G). 

Proof. Let L be the layout of vertices ordered by right interval endpoints. We first 
observe that, for all u,v G V such that (u, v) G E and u <l v, all vertices between 
u and V in L are adjacent to v. Now consider a longest edge in L, i.e., an edge {u, v} 
such that |L(u)— L(v)| = b(G, L). Assume, without loss of generality, that u <l v. 
From the previous observation, it must be that degree[v) > L(v)— L(u) = b(G, L). 
Now the degree bound (Lemma |Q implies bw(G) > b(G,L)/2. □ 

Lemma 4. Given an interval graph G, the layout L consisting of vertices ordered 
by left endpoints of corresponding intervals has b(G, L) < 2 • bw(G). 

Proof. Consider a set of intervals representing G, and the layout L, ordered 
by left endpoints. Now, flipping the intervals of the model horizontally results 
in another interval representation for G, and the ordering of vertices by right 
endpoints of these intervals is the reversal of L. Thus, this lemma follows from 
the previous one. □ 
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We will use the following lemma in subsequent sections of the paper. 

Lemma 5. Let I be a set of intervals on the real line corresponding to interval 
graph G = (V, E). Let pi be a point on the line such that at least one interval 
endpoint is to the left o/pi and only left endpoints are to the left ofp-\. Let 
P 2 be a point on the line such that at least one interval endpoint is to the right 
of Vi ond only right endpoints are to the right o/p 2 - Let Ci be the set of all 
intervals that contain pi, and C2 be the set of all intervals that contain P2- If 
L is a layout for G in which vertices are ordered by increasing left endpoints 
of corresponding intervals or by increasing right endpoints, or if L is a layout 
produced by Kleitman and Vohra’s bandwidth algorithm m, then 
(i) Vv G Gi .■ (v, L(1 )} G E, and 
(a) Vv G G2-' {v,E(n)} G E. 

Proof. Part (i) for the left endpoint ordering follows from the fact that L( 1 ) G Gi 
and Gi is a clique. In the other two layouts, E(1) is the interval with smallest 
right endpoint. This interval is either in Gi or is contained in all intervals of Gi . 
Thus, (i) holds for the three layouts. 

Part (ii) follows immediately for the right endpoint layout, since E(n) G G2. 
In the left endpoint order, E(n) is either in G2 or contained in all intervals of 
G2, implying (ii). 

The proof of Part (ii) for Kleitman- Vohra layouts heavily relies on details of 
the algorithm in m and is omitted here. □ 

3 Circular- Arc Graphs 

Circular-arc graphs are the intersection graphs of arcs on a circle. Thus, a graph 
G = (V, E) is a circular-arc graph if and only if it has a (not necessarily unique) 
circular-arc model or representation, consisting of a set of arcs on a circle, such 
that, for all u,v G V, {u,v} G E if and only if the arcs corresponding to u and 
V have a nonempty intersection. In such a model, we assume, without loss of 
generality, that the arc endpoints are distinct, and we label the endpoints from 
1 to 2 n in clockwise order around the circle, starting at an arbitrary endpoint. 
Thus, each vertex v G V corresponds to an arc given by its counterclockwise 
endpoint, ccw(v), and its clockwise endpoint, cw(v). We refer to any segment 
of the circle by its two endpoints and the direction of traversal, i.e., [pi ,p2lcw 
refers to the closed arc covered by a clockwise traversal beginning at pi and 
ending at P2. The arc [pi,P2]ccw is the set of all points in a counterclockwise 
traversal from pi to P2, and parentheses will indicate that the arc is open at one 
or both ends. Note that, for any two points (not necessarily arc endpoints) on 
the circle, pi and p2, the arcs [pi ,P2lcw and [pi ,P2lccw cover the entire circle, 
and their intersection is (pi ,P2}- 

Eschen and Spinrad m have given an O(n^) algorithm which determines 
whether or not an n-vertex graph is a circular-arc graph. If so, the algorithm 
produces a circular-arc model for the graph. Our algorithms assume that the 
input circular-arc graph is given as a set of arcs on a circle. 
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Henceforth, we will refer to a set of 2n scanpoints on the circle, none of 
which is an arc endpoint, such that exactly one of these points is between each 
consecutive pair of arc endpoints. We shall label these points from 1 to 2n in 
clockwise order, beginning at any one. 

Our bandwidth approximation algorithm works as follows, for a circular-arc 
graph G. Roughly speaking, we cut the circular-arc representation in half, to 
form two equal-sized interval graphs, compute exact or approximate bandwidth 
layouts for the two interval graphs, and then mix the two layouts to form an 
approximate bandwidth layout for G. 

Let G = (V, E) be a circular-arc graph with corresponding circular-arc rep- 
resentation. The first step is to find a scanpoint p on the circle such that 
Gi U G 2 U A| = |Gi U G 2 U B| where Gi is the set of arcs that contain scanpoint 
1, G 2 is the set of arcs that contain scanpoint p, A is the set of arcs entirely 
contained in (1,p)cw, and B is the set of arcs entirely contained in (1,p)ccw- 
Note that Gi UG 2 UAUB = V. We will use scanpoints 1 and p to cut the circle 
and create two equal-sized interval graphs. 



Procedure FINDp 

Let Gi <— G 2 <— all arcs that contain scanpoint 1; A<— 0; B<— V\Gi 
0. i — IGt 1^ b i — Ti {q.= |G'iUG 2UA|; b = |G'iUG2UB|} 

p <— 1 

repeat until a = b or p = 2n 
{ Invariant: a < b} 

{ Variant: 2n — p} 

p <— p -|- 1 

if the endpoint between p — 1 and p is a ccw endpoint (say of arc i) then 
G2 ^ — G2 U {i} 

if i ^ Gi then 
B <- B \ (i) 
a <— a -I- 1 

if between p — 1 and p is a cw endpoint (of arc i) then 
Cz <— Cz\ (i) 

if i ^ Gi then 
A i — A U |i} 
b <- b- 1 

{ Now G 2 is the set of arcs that contain point p} 

(|Gi U G 2 U A| = |Gi U G 2 U B|} 



Claim. Procedure FINDp will terminate with a = b. 

Proof. It is a matter of routine to verify the stated invariant and variant. If the 
loop terminates with p = 2n then all arc endpoints will have been examined. For 
all arcs except those of G 1 , a will have been incremented by 1 and b will have 
been decremented by 1 . Let Ui and Uf be the initial and final values, respectively, 
of variable a, and bi and bf the initial and final values, respectively, of variable b. 
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Upon termination of the loop with p = 2n, Qf = ai+n— |Ci| = |Ci|+n— |Ci| = n 
and bf = hi — (n — |Ci |) = n — n + |Ci | = |Ci |. But then bf < Qf (assuming 
Cl 7 ^ V), contradicting our invariant. □ 

We may assume that A and B will be nonempty; otherwise G can be parti- 
tioned into two cliques, one of which must have size at least n/2, implying (by 
Lemma Pi bw(G) > n/2 — 1 . Thus, any layout in which the first and last vertices 
are not adjacent is a 2-approximation. 

We now describe how to construct two interval subgraphs of G by cutting the 
circle at scanpoints 1 and p. We wish to cut the circle and the arcs of Gi and Gi 
at scanpoints 1 and p, producing two line segments, each with a set of intervals 
that correspond to an interval graph. However, if any arc, say v, contains both 
scanpoints 1 and p then it covers one entire part of the circle (i.e. [1,p]cw or 
[1,p]ccw) and appears as two disconnected pieces in the other part. Thus, this 
second part of the circle may not correspond to an interval subgraph, as vertex 
V is represented by two disconnected intervals. We eliminate this problem by 
shrinking v’s arc on the circle so that it no longer contains p and thus v is 
removed from Cz- The altered set of arcs might not represent all of the edges of 
G; specifically, some edges between v and elements of A (or B) may be missing. 
Let E' denote edges of G that are not represented by the changed arcs. Note 
that the sets Gi U Gi U A and Gi U Gi U B remain unchanged. 

Now, we can cut the circle and the arcs of Gi and Gi at scanpoints 1 and 
p, producing two line segments, [l,p]cw and [1,p]ccw The arcs of the circular- 
arc model become intervals on the two lines. Let Ia (respectively Ib) be the 
resulting set of intervals on the line segment [1 ,p]cw (respectively [1 ,p]ccw)- We 
may assume that the intervals of Gi U Gi are altered slightly in Ia and in Ib 
without changing intersections, so that interval endpoints are distinct. 

Let Ga = (Va,Ea) and Gb = (Vb,Eb) be the intersection graphs of Ia and 
Ib, respectively. Now, Ga and Gb are both interval graphs and (not necessarily 
induced) subgraphs of G. Furthermore, |VaI = |VbI, and EaUEbUE' = E. Figure 
n illustrates this process. 

Our method for obtaining an approximate bandwidth layout for a circular- 
arc graph is to first compute exact or approximate bandwidth layouts. La and 
Lb, for Ga and Gb, respectively, and then mix the two layouts. 

Different methods of computing La and Lb yield different approximation 
bounds and time complexities for our algorithm. 

Regardless of how we obtain La and Lb, the mixing is done as follows. 

Let k = |Gi U Cz U A| = |Gi U Cz U B|. 

Given 

La =LA(1),LA(2),...,LA(k) 

and 

Lb =LB(1),LB(2),...,LB(k) 

we begin by producing 



Lm =LA(1),LB(1),LA(2),LB(2),...,LA(k),LB(k). 
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Fig. 1. Cutting the circular-arc model to form two interval graphs 

For convenience, we will refer to elements of La as having the colour red and 
elements of Lb as having the colour blue. Notice that Lm will contain two copies 
of each vertex of Ci U C 2 - one red and one blue. For each v € Ci U C 2 , we shall 
distinguish between the two copies of v in Lm as follows: the red copy will be 
referred to as Vred and the blue as Vbiue- Each vertex of A U B occurs only once 
in Lm- 

From Lm, we produce L by deleting the leftmost copy of each vertex of Ci 
and the rightmost copy of each vertex of C 2 - Recall that we constructed Ci and 
C 2 so that no vertex appears in both. Thus, L is a layout for G. 

Lemma 6. Let G = (V, E) be a circular-arc graph, and let Ia, Ib, Ga, and Gb 
be constructed as previously described, from a circular-arc model for G. Let La 
and Lb be layouts for Ga and Gb, respectively, satisfying: 

• Vv G Gi .' (v, La(1)},{v, Lb(1)} G E, and 

• VvG G 2 .-{v,LA(k)},{v,LB(k)}GE. 

Let Lm and L be obtained from La and Lb as previously described. Then 

b(G,L) < 2 • max[b(GA,LA),b(GB,LB)]. 

The proof of the lemma is omitted due to space restrictions. 

Theorem 1. The bandwidth of a circular-arc graph can be approximated to 
within a factor of four in 0(n) time, and to within a factor of two in O(nlog^n) 
time. 

Proof. We have three approximation algorithms for approximating the band- 
width of a circular-arc graph, namely, the algorithm previously described in 
which: 

(i) La and Lb are layouts of vertices ordered by left endpoints of intervals, 

(ii) La and Lb are layouts of vertices ordered by right endpoints of intervals, or 

(iii) La and Lb are layouts computed by Kleitman and Vohra’s algorithm. 
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Algorithms (i) and (ii) have time complexity 0(n), provided the sorted arc 
endpoints are given, and they output a layout L that satisfies: 

b(G,L) < 2 • max[b(GA,LA),h(GB,hB)] 

< 2 • max[2 • bw(GA),2 • bw(GB)] 

= 4 • max[bw(GA),bw(GB)] 

< 4 • bw(G) 



Algorithm (iii) requires O(nlog^n) time but produces a layout L satisfying: 

b(G,L) < 2 • max[b(GA,LA),b(GB,hB)] 

< 2 • max[bw(GA), bw(GB)] 

< 2 • bw(G) 



These performance ratios follow from Lemmas 0 and El and the fact that any 
subgraph of graph G has bandwidth not larger than bw(G). □ 

4 Chordal Graphs with Clique Trees Having a Bounded 
Number of Leaves 

A graph G is a chordal graph if every cycle of length greater than three has a 
chord. Chordal graphs are exactly the intersection graphs of subtrees in a tree 
m- More precisely, for each chordal graph G = (V, E), there exists a tree T such 
that the vertices of T correspond to the maximal cliques of G , and the vertices of 
T corresponding to cliques of G containing any fixed vertex v € V form a subtree 
Tv of T. Note the consequence that two vertices of G are adjacent if and only 
if their corresponding subtrees have nonempty intersection. For a given chordal 
graph G = (V, E), such a tree, called a clique tree for G, will have no more than 
n nodes and can be constructed in 0(n + m) time 0. 

We use the idea of mixing layouts of interval graphs, as in the previous sec- 
tion. While a circular-arc graph roughly consists of two interval graphs arranged 
in a circle, a chordal graph may be thought of as several interval graphs arranged 
in a tree-like structure. A chordal graph with k leaves in its clique tree may be 
viewed as a collection of k interval graphs. 

Our algorithm is as follows, assuming a clique tree T has been computed for 
a given chordal graph G = (V, E). 

1. Root T at an arbitrary vertex, r. 

2. Let k be the number of leaves of T (excluding r). For each root-to-leaf path 
Pi in T, the collection of subtrees, restricted to Pt, form a set of intervals. Let 
Ii be this set of intervals in which the left endpoint of each interval is taken 
to be the one closer to r. Let Gi = (Vi, Et) be the corresponding interval 
graph. 
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3. for i <— 1 to k do 

Li <— layout for Gi consisting of Vi ordered by increasing left endpoints 
of intervals (with ties broken arbitrarily, but the same way in all 
the Li’s) 

4. Mix the Li’s to form Ljvi) as follows: 

Lm ^ Li (1 )L2(1 )Ls (1 ) . . . Lfc(1 )Li (2)L2(2) . . . Lk(2) . . . 

5. For each vertex v G V that appears in more than one of the Gi’s: delete all 
but the rightmost copy of v from Ljvi- The result is a layout L for G. 

The proof of the following theorem is omitted due to space restrictions. 

Theorem 2. Let G = (V, E) he a chordal graph having a clique tree with at most 
k leaves. Then a layout L for G satisfying b(G,L) < 2k-bw(G) can be computed 
in 0(k(n + m)) time by the algorithm described above. 



5 k-Polygon Graphs for Fixed k 

We make use of the results of the previous section as follows. We transform any 
k-polygon graph G into a chordal graph H having a clique tree with at most 
k leaves, by taking a minimal triangulation of the input graph. We show that 
there exists such a triangulation which is a subgraph of G'^. Combining these 
observations with Lemma |21 and the approximation algorithm of the previous 
section, we obtain an O(n^) approximation algorithm for the bandwidth of k- 
polygon graphs which has performance ratio 2k^. 

A graph G = (V, E) is a k-polygon graph if it is the intersection graph of 
chords inside a convex k-polygon, where each chord has its endpoints on two 
different sides of the polygon. A polygon representation, or diagram, for G = 
(V, E), is a k-sided polygon together with a set of chords such that, for all u, v G 
V, {u, v) G E if and only if the chords corresponding to u and v cross. 

Circle graphs are the intersection graphs of chords inside a circle. Thus, 
circle graphs are the union of all k-polygon graphs, over all k > 2. Given a graph 
G = (V, E), it can be determined in 0(|V|'^) time whether or not G is a k-polygon 
graph and, if so, a polygon representation can be constructed 0. However, the 
general problem, given a circle graph, determine the minimum k such that G is 
a k-polygon graph, remains NP-complete |0| . 

Our algorithm assumes that a k-polygon representation for the input graph 
G is provided. 

Definition 2. A triangulation of a graph G is a chordal graph El with the same 
vertex set as G, such that G is a subgraph ofYi. A triangulation H o/o graph G is 
called a minimal triangulation of G, if no proper subgraph ofYiisa triangulation 
ofG. 

Our bandwidth approximation algorithm works as follows. First it computes 
a minimum triangulation H (i.e., one with the minimum number of edges) for the 
given graph G using an O (n^ ) algorithm for circle graphs presented in M- Hence 
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H is a minimal triangulation of G . Any minimal triangulation of a circle graph, 
and thus of a k-polygon graph, can be represented by a planar triangulation of 
a particular convex polygon m- Now our algorithm takes the dual graph of 
the planar triangulation of H (except the exterior face) which is a tree with not 
more than k leaves and constructs a clique tree of H with not more than k leaves. 
Finally the algorithm of the previous section is applied to H and its clique tree. 
For the overall algorithm we obtain the following. (For details we refer to the 
full version of our paper.) 

Theorem 3. The algorithm described above computes for a Y-polygon graph 
given as a "k-polygon representation a layout L satisfying b(G,L) < 2k^ •bw(G). 
It executes in time O(n^). 
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Abstract. A new approximation algorithm for maximum weighted mat- 
ching in general edge- weighted graphs is presented. It calculates a match- 
ing with an edge weight of at least | of the edge weight of a maximum 
weighted matching. Its time complexity is 0{\E\), with \E\ being the 
number of edges in the graph. This improves over the previonsly known 
i- approximation algorithms for maximum weighted matching which re- 
quire 0{\E\ ■ log (\V\)) steps, where |P| is the number of vertices. 

1 Introduction 

Graph Matching is a fundamental topic in graph theory. Let G = (V,E) he a 
graph with vertices V and undirected edges E without multi-edges or self- loops. 
A matching of G is a subset M C E, such that no two edges of M are adjacent. 
A vertex incident to an edge of M is called matched and a vertex not incident 
to an edge of M is called free. An enormous amount of work has been done in 
matching theory in the past. Different types of matchings have been discussed, 
their existence and properties have been analyzed and efficient algorithms for 
the calculating of specific matchings have been developed. Many results have 
been achieved for specific types of graphs like bipartite, planar or other ones. 

A central aspect are matchings with high cardinality. A Maximal Matching 
AfMAX is a matching which cannot be enlarged by an additional edge without 
breaking the matching property. A graph may have several different maximal 
matchings and, especially, maximal matchings of different cardinality. A Max- 
imum Cardinality Matching Mmcm is a matching of maximum size, i.e. for all 
matchings M of G holds |Mmcm| > \M\. Matchings are also discussed for graphs 
with edge weights w : E IR. For a set F C if let W (F) := X){a 6 }gf ^}) 
be the weight of F. A Maximum Weighted Matching Mmwm is a matching of 
highest weight, i.e. for all matchings M of G holds VF(Mmwm) > W{M). 

Many algorithms for the calculating of matchings have been developed in 
the past. Please consult e.g. ISI9I for the history of matching algorithms. In 
the following, only the currently fastest algorithms are stated. Simple methods 

* Supported by DFG/HNI-Graduiertenkolleg ’’Parallele Rechnernetze in der Produk- 
tionstechnik” and DFG Sonderforschungsbereich 376: ’’Massive Parallelitat” 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 259-^221 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



260 



Robert Preis 



with time complexity 0{\E\) can be used to calculate maximal matchings. The 
fastest algorithm for maximum cardinality matching up to date is by Micali and 
Vazirani m which has a time complexity oi 0{\E\^J\V\). In the edge- weighted 
case, the algorithm by Gabow 0 calculates a maximum weighted matching 
in time 0(|G| • \E\ + |RpZo( 7 (|y|)). This time complexity has been improved 
by Gabow and Tarjan ^ under the assumption of integral costs that are not 
particularly high: if the weight function w : E ^ N] assigns only integers 

between —N and N, their algorithm will run in 0{yJ\V\- a{\El\V\)-logm)- 
\E\ ■ log(\V\ ■ N)) time, where a is the inverse of Ackermann’s function. 

All algorithms discussed so far have super-linear time complexity. Recently, 
approximation algorithms for matching problems have attracted more and more 
attention. They have a smaller time complexity than an optimal algorithm and 
calculate suboptimal solutions. The guaranteed quality is described by an ap- 
proximation factor which states the worst case loss to an optimal solution, e.g. 
a factor of ^ guarantees that the solution quality is at least half the value of the 
optimum solution. It is a simple exercise to prove that any maximal matching 
AImax has a cardinality of at least ^ the cardinality of a maximum cardinal- 
ity matching, i.e. |M|viax| > ^IAIivicmI- Therefore, any algorithm for maximal 
matching is an ^-approximation algorithm for maximum cardinality matching. 

Augmenting paths are often considered for graph matching, especially for 
approximating maximum cardinality matching. It is a path of an odd number of 
edges with alternating edges of M and of E\M and two free vertices as endpoints, 
i.e. an augmenting path of length I consists of ^ edges of M and ^ edges 
of E\M . If such a path exists, the cardinality of M can be increased by one by 
exchanging the matched and unmatched edges of the path. Based on the work 
by Hopcroft and Karp 0 , it can be shown that if the shortest augmenting path 
with respect to a matching Mi is I, then \Mi\ > j^|M|vicm| (see e.g. jZj, p.l56). 
Matchings without short augmenting paths can be calculated very fast, e.g. a 
matching with a shortest path of length I > 5 can be computed in time 0(|A|), 
resulting in \M^\ > ||Mmcm|- If the minimum degree min and maximum degree 
max of all vertices are considered, it can easily be proven that if the shortest 
augmenting path has a length / > 5, then \M^\ > 1^1’ "'^hich implies 

l-ATsI > ||K| for graphs with a regular degree, i.e. | of the vertices are matched. 

For the weighted case, the GREEDY-algorithm of Figure ^calculates a match- 
ing Mgreedy with weight IF(Mgreedy) > ^kF(AfiviWM) which is analyzed by 
Avis IQ. It requires a time of 0{\E\ ■ log{\V\)), if the edges are sorted by their 
weights in a preprocessing step. 

1.1 New Result 

The new algorithm is similar to GREEDY. It guarantees the same approximation 
quality, but has only a time complexity of 0{\E\). The new algorithm LAM is 
described in the following sections and we now state the new theorem: 

Theorem 1. Let G = (V, E) be a graph with vertices V and weighted undirected 
edges E. A matching Mlam of G with an edge weight of at least ^ of the edge 
weight of a maximum weighted matching can he computed in linear time 0(\E\). 
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GREEDY-Algorithm 

Mgreedy 0; 

WHILE {E 7^ 0) 

take an edge {o, 6} € E with highest weight; 
add {a, 6} to MgreedyI 
remove all edges incident to a or 6 from 
ENDWHILE 



Fig. 1. GREEDY: i-app. alg. for maximum weighted matching in time 0(|E| • 
l 09 {\V\)). 



Proof. Lemma 0 (Section 0) shows that the algorithm LAM of Figure 0 (Sec- 
tion B calculates a matching Mlam with an edge weight of at least i the edge 
weight of a maximum weighted matching and Lemma 0 (Section 0 shows the 
time complexity of 0(|if|). □ 

According to corollary Q (Section EJ, matching Mlam computed by LAM is 
also maximal. As stated above, |Mmax| > ^IATmoviI, resulting in: 

Corollary 1. The matching Mlam computed by algorithm LAM not only has a 
weight of at least ^ the weight of a maximum weighted matching, but also has a 
cardinality of at least ^ the cardinality of a maximum cardinality matching. 

1.2 Our Motivation: Matching for Multilevel Graph Partitioning 

Graph Matching has a wide range of application, some of which are discussed 
by Lovasz and Plummer in 0. Our motivation comes from the use of matchings 
in a multilevel approach for efficient partitioning of very large graphs. In graph 
partitioning, the vertices of a graph are partitioned in a fixed number of equally 
sized parts, such that the number of edges connecting vertices of different parts 
is minimized. In the weighted case, the sum of weights of crossing edges is min- 
imized. The calculation of a partition with minimum weight of crossing edges is 
AP-complete, even for the case of partitioning a graph with equal edge weights 
into two parts 0. 

Therefore, efficient heuristics are used to calculate good partitions in a reason- 
able amount of time and many freely available partitioning libraries like PARTY 
PH include several different types of methods, but many heuristics still have 
a high time complexity when used for very large graphs. The solution is to 
coarsen the large graph in several levels to smaller and smaller graphs with a 
similar structure, to partition the smallest graph and to project the here found 
partition back through the levels to a partition of the original graph. 

The single coarsening steps between two levels are usually performed by 
matchings, i.e. a matching of the graph is calculated and the vertices incident to 
a matching edge are contracted. It is important to contract those vertices which 
are connected via an edge of a high weight, because it is very likely that this edge 
does not cross between parts in a partition with a low weight of crossing edges. 
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Generally, the use of a maximum weighted matching would benefit the coarsening 
step the most, but the super-linear time complexity of an optimal algorithm is 
too high for real examples. Therefore, fast approximation algorithms like the 
new one in this paper are very useful here. 

2 New Linear Time Approximation Algorithm 

Figure O outlines the new LAM algorithm. It starts with an empty matching 
Mlam and repeatedly adds an edge {a, b} to Mlam and removes all edges incident 
to a or b, because they cannot be part of the final matching. The key idea is to 
add locally heaviest edges {a, 6}, i.e. an edge with a weight of at least as high as 
the weight of all adjacent edges remaining in E, i.e. w({a, 6}) > w{{x,y}) for 
any {x,y} € E with a = x or b = x. After a locally heaviest edge is removed 
from E, further edges may become locally heaviest. Note that at least one locally 
heaviest edge always exist. 



Outline of LAM-Algorithm 

AIlam 0; 

WHILE {E 7^ 0) 

take a locally heaviest edge {a,b} G E] 
add {a, b} to Mlam! 

remove all edges incident to a or t> from E] 
ENDWHILE 



Fig. 2. Outline of LAM: Linear time ^-app. alg. for maximum weighted match- 
ing. 



The main problem is to find such an edge. Figure 0 shows the idea. The 
algorithm starts with an arbitrary edge and checks the remaining adjacent edges. 
As long as an adjacent edge with higher weight can be found, the algorithm 
switches to the new edge and repeats the checking procedure until a locally 
heaviest edge is reached, i.e. the weight increases along the path. 

The detailed algorithm LAM is shown in Figure El It starts with an empty 
matching Mlam - The global sets U and R store the unchecked edges {U = E 
the start) and the removed edges (i? = 0 at the start). The main algorithm is a 
WHILE loop which calls the procedure 'try match' with an arbitrary unchecked 
edge {a,b}. This edge is not added to the matching until all adjacent edges to 
free vertices are checked for higher weight which is managed in the WHILE part 
of the procedure. Every call of procedure 'try match ({a, b})' stores its own sets of 
locally checked edges C^{a,b}{b), depending on whether the checked 

edges are incident to a or b. Let C be the union of all locally checked edges, i.e. 
C := C[a,b}{a) U C{a,b}{b) (we will show later that 'try match’ is called 

not more than once for every edge) . As long as a and b are free and at least one of 
them is incident to an unchecked edge, it is checked and, if it has a higher weight, 
'try match' calls itself recursively with the new edge. Recursive calls are repeated 
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current matching M_LAM 
~ ~ edges along search path 
edges incident to a or b 
new matching edge: locally heaviest edge 



Conditions: 

weight of edge 1 < weight of edge 2 < weight of edge 3 
weight of {a,b} >= weight of {a,c_i} 
weight of {a,b} >= weight of {b,d_i} 



Fig. 3. Starting with an arbitrary edge 1, the path progresses along edges 2 and 
3 with higher weight until a locally heaviest edge {a, b} is found. (Only edges of 
the current matching Mlam, edges along the path and edges adjacent to {a,b} 
are shown.) 

until a locally heaviest edge is reached. Then, the edge is added to Mlam, the 
algorithm terminates the current call of 'try match’, tracks back the search path 
by one edge and continues the WHILE loop by checking further adjacent edges. 

The WHILE loop terminates if vertices a and/or b were matched in a recursive 
call or if no further adjacent unchecked edge exists. Then in the IF-ELSE part, 
{a, b} is added to Mlam if a and b are free. In addition, edges incident to matched 
vertices and checked in the current call are removed, i.e. moved from C to R. If a 
or b remains free, all edges which were checked in the current call and are incident 
to two free vertices are unchecked, i.e. moved back from C to U. Lemma 0 will 
show that only a limited number of times edges are moved back from C to U. 

Note that the WHILE loop alternately checks adjacent edges incident to a 
and incident to 6, as long as both types are available. This property will be used 
in Section 0 to show the linear time requirement. Additionally, removed edges 
cannot take part in the following matching process, but we need to keep track of 
how many edges are removed in every part of the algorithm. Unlike in the outline 
of the algorithm in Figure |5] edges incident to matched vertices are not removed 
immediately, but removed in those procedure calls of the algorithm where they 
were checked the last time. 

It is fairly obvious that the unchecked edges in U have the property that both 
incident vertices are free. It holds true for the start. New edges are only moved 
back to U in the IF-ELSE part, but only if they are incident to two free vertices. 
Furthermore, an edge {a, b} may only be matched at the end of the IF-ELSE part. 
In this case, a and b have to be free, and the WHILE loop terminated because 
there was no edge {a, c} or {b, d} in U, i.e. all edges in U keep the property of 
being incident to two free vertices. 

The central part of the algorithm is the procedure 'try match’ and the fol- 
lowing lemma shows the number of times it is called: 

Lemma 1 (|if| calls). Procedure 'try match’ is called only once for every edge. 
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LAM- Algorithm 

-^LAM 0: 

U E\ 

R := 0: 

WHILE [U / 0) 

take arbitrary edge {a, 6} G U\ 
try match ({a, b}); 

ENDWHILE 

PROCEDURE try match ({a, 6}) 

C{a,b} (ii) 0: C^a,b} {b) '■= 0; /* empty local sets of checked edges at the start */ 

WHILE (a is free AND b is free AND (3{a, c} G U OR 3{b, d} G U)) 

IF (a is free AND 3{a, c} G U) 

move {a, c} from U to C'^a,6}(®)l /* rnove from U to C */ 

IF (iL)({a,c}) > it)({a,fc})} 

try match ({a, c}); /* call heavier edge */ 

ENDIF 

ENDIF 

IF {b is free AND 3{b,d} G U) 

move {b, rf} from U to C^a,b}{b)] /* move from U to C * / 

IF > iu({a,fc})) 

try match ({fc, d}); /* call heavier edge */ 

ENDIF 

ENDIF 

ENDWHILE 

IF (a is matched AND b is matched) 

move edges C'{a,6}(®) 3i^cl C'{a,fe}(^) to R; /* move from C to i? */ 

ELSE IF (a is matched AND b is free) 

move edges C{a,6}(®) ^f^cl {{6, d} G C'{a,6}(^)| d is matched} to K;/* move from C to R */ 

move edges {{6, d} G C'{a,b}(^)| d is free} back to U', /* move from C back to t/ */ 

ELSE IF {b is matched AND a is free) 

move edges C^a,b}{b) and {{a, c} G ^ is matched} to /?;/* move from C to i? */ 

move edges {{a, c} G C'{a,b}(^)l ^ is free} back to C7; /* move from C back to t/ */ 

ELSE /* a is free AND b is free */ 

move edges C^a,b}{^) ^nd C'^a,6}(^) to R; /* move from C to i? */ 

add {a, 6} to MlamI /* new matching edge {a, 6} */ 

ENDIF 



Fig. 4. LAM: Linear time ^-app. alg. for maximum weighted matching. 



Proof. The procedure ’try match’ is only called with an edge {a, b} from U, which 
ensures that both vertices are free. The same edge cannot be the parameter in 
deeper recursive calls, because the search path only progresses along strictly 
higher edge weights. Finally, in the IF-ELSE part of 'try match', either a and/or 
b are already matched or they are matched by including edge {a,b} in Mlam- 
Therefore, after the first call of 'try match({a, 5})', a and/or b are matched, i.e. 
{a, 6} cannot be in U anymore and cannot be called a second time. □ 

The following lemma shows the status of the edges: 

Lemma 2 (edge status). At the start all edges are not checked (U = E), and 
there are no checked or removed edges (R = C = %). At any stage, an edge 
{a,b} & E is either unchecked (& U), checked (g C) or removed (G R), resulting 
in E = U U C U R. At the end, all edges are removed, i.e. U = C = % and R = E. 

Proof. The status at the start is clear from the algorithm. The edges are only 
moved between the sets U, C and R, ensuring the stated status throughout the 



/* empty matching at the start */ 

/* all edges are unchecked at the start */ 
/* no removed edges at the start */ 
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algorithm. The WHILE loop of the main algorithm terminates when U is empty. 
Besides, all edges C^a,b}{0‘) and C^a,b}{b), which have been moved in the WHILE 
loop of procedure call 'try match ({a, 6})’, are either moved to R or back to U in 
the IF-ELSE part of the same call. Therefore, C is also empty after termination 
of all 'try match' calls, resulting, with E = U U C U R, in E = R. □ 

Lemma E| ensures that all edges are removed at the end of the algorithm, 
because at least one of the incident vertices has been matched. Thus: 

Corollary 2 (maximal). The resulting matching Mlam is maximal. 

3 I - Approximation Quality 

In this section the property of locally heaviest edge for any edge added to the 
matching and the ^-approximation are shown. A similar proof has been done in 
PP to show the ^-approximation for the GREEDY algorithm of Figured 

Lemma 3 (locally heaviest edge). Algorithm LAM starts with an empty 
matching and an edge {a, b} is only added if a and b are free and neither a 
nor b are adjacent to a free vertex with an edge of higher weight than {a, b}. 

Proof. An edge {a, b} is only added to Mlam in the last ELSE part of 'try match’, 
i.e. a and b are free. Let a be adjacent to a free vertex c (or b adjacent to a free 
vertex d). According to Lemmad edge {a, c} ({5, d}) may be 

in R : Impossible, because then either a or c are already matched, 
in {7 : Impossible, because then the WHILE loop would not have terminated, 
in C : The weight of {a, 6} ({b, d}) is higher than of all other edges in the search 
path. When edges were checked along the search path, either their weight 
was not higher than the weight of the corresponding path edge, or the search 
path progressed along this edge. In the later case, either the recursive call 
terminated with at least one incident vertex of that edge being matched, or 
the edge is still in the search path. □ 



Lemma 4 (i-approximation). Algorithm LAM computes a matching Mlam 
with at least ^ of the edge weight of a maximum weighted matching Mmwm • 



Proof. Compare Mlam to an arbitrary matching Mmwm , let Vlam be the matched 
vertices of the current Mlam and Vmwm be the matched vertices of Mmwm- 
Throughout the algorithm we will show that the weight of the current matching 
Mlam is at least i the weight of the edges of Mmwm incident to a vertex of Vlam : 



IV(Mlam) > 2^^''-''-^’ ^ Mmwm|u G Vlam V u G Vlam}) 



It holds for the start (Mlam := 0)- After adding an edge {a, 6} to Mlam, 
IV(Mlam) increases by w{{a, b}), but also the right hand side may increase. 
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If {a,b} e Mmwm, the right hand side only increases by ^w{{a,b}). Oth- 
erwise, let {a,c},{b,d} G Mmwm be the possible edges adjacent to {a,b}. The 
choice of matching edge {a,b} excluded the possible choice of {a,c} and {b,d} 
throughout the rest of the algorithm. These are the only two edges by which 
the subset of Mmwm may increase, i.e. the right hand side may only increase 
by i(u>({a, c}) -I- w{{b,d})). If c G VLam {d G Vlam) before we add edge {a, 6}, 
then {a, c} ({6, d}) is already in the subset of Mmwm- If c ^ VLam {d ^ VLam), 
i.e. c (d) is free. Lemma 0 insures that w({a,6}) > i(;({a, c}) (w({a, 5}) > 
w({5, d})). Therefore, the value on the right hand side cannot increase by more 
than w({a, 6}). 

At the end, algorithm LAM terminates with a maximal matching Mlam 
(Corollary El, i-e. for all edges {u,v} is u £ VLam or u G Vlam- Therefore, 

W{M\_am) > 2^^''-''-^’ ^ ATmwm|w G Vlam V u G Vlam}) = 

4 Linear Time Requirement 

The time requirement depends on the number of times edges are moved between 
the sets U of unchecked, C of checked and R of removed edges. There are only 
three ways in which an edge may be moved: (1) in the WHILE loop of ’try match’, 
previously unchecked edges are checked (moved from U to C), in the IF- ELSE 
part, the checked edges are either (2) removed (moved from C to R), or (3) 
unchecked again (moved from C to U). Therefore, once an edge is removed and 
stored in R, it will not be moved to any other set, ensuring that an edge may 
be moved from C to i? at most \E\ times. Furthermore, an edge may be moved 
several times between U and C, but we will show that every time when edges 
are moved back from C to U, an almost equal number of edges is moved from 
C to i?, showing that the number of edges moved between U and C is 0{\E\). 

Lemma 5 (linear time). Algorithm LAM of Figure^runs in 0{\E\) time. 

Proof. The loop of the main algorithm has at most \E\ iterations, because if 
it makes a call 'try match ({a, 6})’ with an edge {a, 6} G U, at least one of 
a and b are matched after the completion of the call. As stated in Section 0 
an unchecked edge in U is always incident to two free vertices. Therefore, U 
reduces by at least the edge {a,b} in every iteration of the WHILE loop of the 
main algorithm. Although new edges may be added to U in the IF- ELSE part of 
’try match', those edges have previously been moved in the WHILE loop of the 
same procedure. According to Lemma 0 the 'try match' procedure is not called 
more than once for every edge {a, 5} G E. In sum, the time complexity of the 
algorithm is: 

0{\E\) + 0(try match ({a, 6})) 

The procedure 'try match({o, 6})' consists of a WHILE loop and an IF-ELSE part. 
In every WHILE loop, an edge {a, c} and/or {6, d} is checked and moved from U 
to the local sets C^a,b}{<^) and/or C^a,b}{b) of checked edges, resulting in a time 
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of 0{\C{a,b}{o)\ + |C'{a,6}(^)l) for the WHILE loop. The values |C{a_h}(a)| and 
|C'{a,6}(^)l refer to the maximum sizes of the sets {,}(a) and C^a,b}{b), which 
occur after the completion of the WHILE loop. In the IF-ELSE part, the edges of 
the local sets C{a,6}(a) and C[a,b}{b) are either removed (moved to R), or they 
are unchecked again (moved to U), which again leads to a time complexity of 
0(|C'{a,h}(a)| + |C'{a,6}(^)l)- Thus, the run time of the algorithm is: 

wi) (1) 

We will show that the number of check operations is not much larger than the 
number of remove operations for every call. Let R{a,b} be the set of edges removed 
in the procedure call 'try match({a, 5})', i.e. X^try match ({a 6}) ^{a,b} = R- We 
distinguish between the cases of the IF-ELSE part: 

1. a and b are matched; a and b are free: In both cases the size of sets 
C’{a,6}(a) and C'{a,6}(&) is equal to the number of removed edges, i.e. 

|C{a.h}(a)| + \C{a,b}(b)\ = |-R{a.h}| (2) 



2. a is matched, b is free: In this case, a was matched with an adjacent 
vertex c in a recursive call of 'try match’. It may be matched in a recursive 
call just one level deeper as shown in FigureEIi), but it may also be matched 
in a deeper level of recursion as result of a loop in the search path as shown 
in Figure EJii). In addition, it may also be matched in a recursive call made 
from vertex b as shown in Figure El(iii). We will show, that the number of 




Fig. 5. Vertices a or 6 are matched in a recursive call. Edges of consecutive 
recursive calls are shown and numbers indicate the level of recursion, i.e. the 
edge weight increases with the numbers, (i): a is matched in a recursive call 
from itself one level deeper, (ii): a is matched in a recursive call from itself 
several levels deeper, forming a loop, (iii): a is matched in a recursive call from 
b. (iv): b is matched in a recursive call from a. 



uncheck operations is not larger than the number of remove operations. Let 
us assume, the WHILE loop terminates after i iterations, i.e. a and c were free 
at the start of every iteration and edge {a,c} matched in the i-th iteration. 
We have to show, that an edge {a, Ci} S U was checked in every iteration i. 
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According to Lemma |21 {a, c} is either in U, C or R at every stage and, 
especially, at the beginning of every iteration of the WHILE loop of the pro- 
cedure call 'try match ({a, 6})’. If {a, c} is checked, it may either be checked 
from the procedure 'try match ({a, 6})’, or from a procedure call earlier in 
the search path. When edges were checked along the search path, either their 
weight was not higher than the weight of the corresponding path edge, or 
the search path progressed along this edge. In the later case, either the re- 
cursive call terminated with at least one incident vertex of that edge being 
matched, or the edge is still in the search path. Therefore, if {a, c} is checked, 
it cannot have a higher weight than {a,b}, but this is impossible, because 
it was matched in iteration i while being the final edge in the search path 
and {a, b} being within the search path. Furthermore, {a, c} can also not be 
removed, because then either a or c is matched. Therefore, {o,c} S t/ at 
the start of every iteration, i.e. the condition of the first IF condition in the 
WHILE loop was true in each iteration and an edge {a,Ci} is checked and 
moved to C{a,b}(a), i.e. |C{a.h}(a)| = i. 

In addition, it is clear that |C{a_;,j.(6)| cannot be more than the number of 
iterations, i.e. |C{a,b}(^)| < resulting in 

|C'{a,h}(a)| + |C'{a,6}(fe)| < 2|C{a,h}(a)| < 2 • |i?{a,b}| (3) 

3. b is matched, a is free: This case is similar to the previous one, but not 
identical, because a is checked first in the WHILE loop. It may happen that 
b is matched in a recursive call from a as shown in Figure Eliv). In this 
case, in the final iteration of the WHILE loop, b was already matched after 
completion of the recursive calls from a and no further edge was added to 
C{a^b}(b). Consequently, we can only guarantee i > ;,}(^)l > i — 1. As in 

the previous case, cannot be more than the number of iterations, 

i.e. < i, resulting in 

|C’{a,b}(fl)| + \C{a,b}{b)\ < 2\C^a,b}{b)\ -|- 1 < 2 • |i?{a,b}| + 1 (4) 

Equations 0 El and 0 reduce the overall time complexity of equation 0 to 

because the total number of removed edges cannot exceed \E\. □ 

5 Conclusion 

The new algorithm LAM has been tested experimentally and compared with 
other matching heuristics on many large graphs from real applications by Birger 
Boyens in j2j. It is implemented in the graph partitioning library PARTY 1 1 I j . 
where it is used for graph coarsening in the multilevel partitioning approach. 

I would like to thank Burkhard Monien and Marco Riedel for many fruitful 
discussions on graph matching. 
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Abstract. The above figure shows some classes from the boolean and 
(truth-table) bounded-query hierarchies. It is well-known that if either 
collapses at a given level, then all higher levels collapse to that same level. 
This is a standard “upward translation of equality” that has been known 
for over a decade. The issue of whether these hierarchies can translate 
equality downwards has proven vastly more challenging. In particular, 
with regard to the figure above, consider the following claim: 

pS = P^li-tt ^ DIFF^(E^ = coDIFF^(E^) = BH(E"J. (**) 

This claim, if true, says that equality translates downwards between 
levels of the bounded-query hierarchy and the boolean hierarchy levels 
that (before the fact) are immediately below them. 

Until recently, it was not known whether (**) ever held, except in the 
trivial m = 0 case. Then Hemaspaandra et al. proved that (**) 
holds for all m, whenever k > 2. For the case k — 2, Buhrman and 
Fortnow then showed that (**) holds when m = 1. In this paper, 
we prove that for the case k = 2, {**) holds for all values of m. As 
Buhrman and Fortnow showed that no relativizable technique can prove 
“for k — 1, (**) holds for all m,” our achievement of the fc = 2 case is 
unlikely to be strengthened to A: = 1 any time in the foreseeable future. 
The new downward translation we obtain tightens the collapse in the 
polynomial hierarchy implied by a collapse in the bounded-query hierar- 
chy of the second level of the polynomial hierarchy. 
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1 Introduction 

Does the collapse of low-complexity classes imply the collapse of higher- 
complexity classes? Does the collapse of high-complexity classes imply the 
collapse of lower-complexity classes? These questions — known respectively as 
downward and upward translation of equality — have long been central topics 
in computational complexity theory. For example, in the seminal paper on the 
polynomial hierarchy, Meyer and Stockmeyer m proved that the polynomial 
hierarchy displays upward translation of equality (e.g., P = NP P = PH). 

The issue of whether the polynomial hierarchy — its levels and/or bounded 
access to its levels — ever displays downward translation of equality has proved 
more difficult. The first such result regarding bounded access was recently ob- 
tained by Hemaspaandra, Hemaspaandra, and Hempel ESI, who proved that if 
for some high level of the polynomial hierarchy one query equals two queries, 
then the hierarchy collapses down not just to one query to that level, but rather 
to that level itself. That is, they proved the following result (note: the levels 
of the polynomial hierarchy are denoted in the standard way, namely. 

Eg = P, Sf = NP, = NP^?-i for each fc > 1, and H^ = {L | L G Sf} for each 
k > 0). 



Theorem 1. 15f ) For each k > 2: then E^ = H^ = PH. 

This theorem has two clear directions in which one might hope to strengthen 
it. First, one might ask not just about one-versus-two queries but rather about 
77i-versus-m -I- 1 queries. Second, one might ask if the k > 2 can be improved to 
k > 1. Both of these have been achieved. The first strengthening was achieved in 
a more technical section of the same paper by Hemaspaandra, Hemaspaandra, 
and Hempel ESI. They showed that Theorem H was just the to = 1 special 
case of a more general downward translation result they established, for k > 2, 
between bounded access to and the boolean hierarchy over E^. The second 
type of strengthening was achieved by Buhrman and Fortnow 0, who showed 
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that Theorem n] holds even for k = 2, but who also showed that no relativizable 
technique can establish Theorem □ for fc= 1 . 

Neither of the results or proofs just mentioned is broad enough to achieve 
both strengthenings simultaneously. In this paper we present a new result strong 
enough to, when used along with another recent downward translation H3|, 
achieve this (and more). In particular, we unify and extend all the above results, 
and from our more general results it easily follows that: 

Corollary 1. For each m > 0 and each k > 1 it holds that: Fm-tt = ^m+i-tt 
DIFF™(SP) = coDIFF^(S^). 



In particular, we obtain for the first time the cases {k = 2 A m = 2), {k = 2 A 
TO = 3), (A: = 2 A TO = 4), ... . 

Informally put, the technical approach of our proof is as follows. In the previ- 
ous work extending Theorem □ to the boolean hierarchy (part [D of Theorem EJ, 
the “coordination” difficulties presented by the fact that boolean hierarchy sets 
are in effect handled via collections of machines were resolved via using certain 
lexicographically extreme objects as clear signposts to signal machines with. In 
the current stronger context that approach fails. Instead, we integrate into the 
structure of easy-hard-technique proofs (especially those of lioioj l the so-called 
“telescoping” normal form possessed by the boolean hierarchy over (for each 
k, see 1 1 iSlt)l 1 1 lllfU] 1 . which in concept dates back to Hausdorff’s work on algebras 
of sets, and has often proven useful in “controlling” the complexity-theoretic 



boolean hierarchies (see, e.g.. 






), and has also been used in the con- 



text of the easy-hard technique (see, e.g., m)- This normal form guarantees 
that if L G DIFFm(S^), then there are sets Li,L 2 ,...,Lm G such that 
L = L\ (L 2 (T 3 * * * (T^_i Ljn) * ■ ■)) and L\ ^ L 2 ^ 1 ^ Ljyi. 

(Picture, if you will, an archery target with concentric rings of membership and 
nonmembership. That is exactly the effect created by this normal form.) 

As noted at the end of Section^ the stronger downward translation we obtain 
yields a strengthened collapse of the polynomial hierarchy under the assumption 
of a collapse in the bounded-query hierarchy (by which, throughout this paper, 
we mean the truth-table bounded-query hierarchy) over 

We conclude this section with some additional literature pointers. We men- 
tion that the proofs of Theorem □ and all that grew out of it — including this 
paper — are indebted to, and use (strong) extensions of, the “easy-hard” tech- 
nique that was invented by Kadin as further developed by Beigel, Chang, 

Ogihara, Kadin, and Wagner [zzi 2 n) to study upward translations of equal- 
ity resulting from the collapse of the boolean hierarchy. We also mention that 
there is a body of literature showing that equality of exponential-time classes 
translates downwards in a limited sense: Relationships are obtained with whether 
sparse sets collapse within lower time classes (the classic paper in this area is 
that of Hartmanis, Immerman, and Sewelson jHj, see also IZIl; limitations of such 
results are presented in HECSI). Other than being a restricted type of down- 
ward translation of equality, that body of work has no close connection with 
the present paper due to that body of work’s applicability only to sparse sets. 
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Finally, we mention that downward separation is closely tied to the recently 
developed theory of “query order” (see the survey CD)- 

2 Preliminaries 

To explain exactly what we do and how it extends previous results, we now 
state the previous results in the more general forms in which they were actually 
established, though in some cases with different notations or statements (see, e.g., 
the interesting recent paper of Wagner m regarding the relationship between 
“delta notation” and truth-table classes). Before stating the results, we very 
briefly remind the reader of some definitions/notations, namely the A levels of 
the polynomial hierarchy, truth-table access, symmetric difference classes, and 
boolean hierarchies. A detailed introduction to the boolean hierarchy, including 
its motivation and applications, can be found in [^. 

Definition 1. 1. As is standard, for each k>l, denotes (see m)- 

As is standard, for each m > 0 and each set A, P^-tt denotes the class 
of languages accepted by deterministic polynomial-time machines allowed m 
truth-table (i.e., non-adaptive) queries to A (see 

2. For any classes C and T>, CAV = {L \ (3C € C)(BD G 'D)[L = CAZ?]}, 
where CAD = {C — D)U {D — C). 

3. (^, see also mm) Let C be any complexity class. We now define the 
levels of the boolean hierarchy. 

(a) DIFFi(C) =C. 

(b) For any m > 1, DIFF^+i(C) = {L \ (3Li G C)(3L2 G DIFF™(C))[L = 

Li — L 2 ]}. _ 

(c) For any m>l, coDIFFm(C) = {L\L G DIFFm(C)}. 

(d) BH(C), the boolean hierarchy overC, is IJ m>iDIFFm. 

The relationship between the levels of the boolean hierarchy over and 
bounded access to is as follows. For each k > 0 and each m > 0, 

pSj CDIFF„+i(EP)C SJ 
m-tt c coDIFF,„+i (Sj) C m+l-tf 

Now we can state what the earlier papers achieved (and, in doing so, those 
papers obtained as corollaries the results mentioned above). 

Theorem 2. 1. m) Let m>0, 0<i<j<k, and i < k — 2. 

If psr[ilADIFF^(S^) = P^"[i1aDIFF^(S^), then DIFF^(S^) = 

coDIFF^(S^). 

2. (^^) If PAS^ = NP AS^, then = PH. 

3. ( ) //S^'AS^ is closed under complementation, then the polynomial 

hierarchy collapses^ 

^ Selivanov establishes only that the hierarchy collapses to a higher level, 

namely a level that contains thus this result is an upward translation of equal- 

ity rather than a downward translation of equality. 
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In this paper, we unify all three of the above results — and achieve the 
strengthened corollary alluded to above (and stated later as Corollary|21) regard- 
ing the relative power of m and m + 1 queries to — by proving the following 

downward translation of equality result: 

Let m > 0 and 0 < i < fc. If Af ADIFF^(SP) = Sf ADIFF^(Sn, then 
DIFF^(S^) =coDIFF™(SP). 

3 A New Downward Translation of Equality 

We first need a definition and a useful lemma. 

Definition 2. For any sets C and D, CAD = {{x,y) \ x £ C y ^ D}. 



Lemma 1. C is -complete for C and D is -complete for T>, then CAD 
is -hard for CAD. 

Proof: Let L £ CAD. We need to show that L <^CAD. Let C £ C and 

D £D he such that L = CAD. Let C C by fc, and D D by fo- Then 
X £ L iS. X £ CAD, X £ CAD iS (x £ C x ^ D), {x £ C x ^ D) lE 

ifcix) £ C ^ foix) ^ D), and {fc{x) £ C folx) ^ D) iff {fc{x), foix)) £ 

CAD. I 

We now state our main result. (Note that as both Af and contain both 
0 and S*, it is clear that the classes involved in the first equality are at least as 
large as the classes involved in the second equality.) 

Theorem 3. Let m > 0 andO < i < k. // Af ADIFF^(S^) = S(’ADIFF^(E^), 
then DIFF„(EP) = coDIFF„(EP). 

This result almost follows from Lemma 0— or, to be more accurate, most of 
its cases are corollaries of Lemma El However, the missing cases — which are by 
far the most challenging ones — need to be established, and Theorem 0 below 
does exactly that. 

Lemma 2. Let m > h and h< i < k - 1 . Lf Ef ADIFF,„(Ef) is closed under 
complementation, then DIFFm(E^) = coDIFFm(E^). 

We do not prove Lemma 0here. It was first established in a precursor version 
of this paper m, but alternately follows immediately from an even more general 
versioin of LemmaEllldl that was built over that precursor. Note that Theorem 0 
does not rely on Lemma 0 

Theorem 4. Letm>0 andk> 1. //A^_^ ADIFF^(E^) = E^_^ ADIFF^(E^), 
then DIFF^(EP) = coDIFF,„(EP). 

^ Namely: Let s, m > 0 and 0 < i < A: — 1. If DIFFs(S? )ADIFFm(E^) is closed under 
complementation, then DIFFm(E^) = coDIFFm(E^) |1 .S| . 
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Definition 3. For each k > 1 , choose any fixed problem that is -complete 
for and call it L'^p . Now, having fixed such sets, for each k > 1 choose one 
fixed set L"p that is in S^_2 o.'n-d one fixed set L-^p ^ that is -complete for 
o,nd that satisf^ L'^p = {x\( 3 y G eI“I)(Vz G G L"p_ ]} and 

, = {{x,y,z) I \x\ = \y \ A (3z')[(kl = \v\ = \zz'\) A {x,y,zz') ^ L'fp ]}. 

fc -2 

Proof of Theorem m Let G be as defined in Definition El 

and let L^p ^ and LQipp^j-gP) be any fixed -complete sets for and 

DIFFm(S^), respectively; such languages exist, e.g., via the standard canon- 
ical complete set constructions using enumerations of clocked machines. From 
Lemma^it follows that L^p ^ ALpupp^(sP) is <gj-hard for A|_^ ADIFFm(S|). 
(Though this is not needed for this proof, we note in passing that it also can 
be easily seen to be in A|_^ ADIFFm(S^), and so it is in fact -complete for 
AP_,ADIFF™(SP).) Since ALdipp^(sP) G ADIFF^(SP) and by as- 
sumption A^_^ ADIFFm(S^) = ADIFFm(S^), there exists a polynomial- 
time many-one reduction h from L^p ^ ALp,jpp^(2P) to L^p ^ALp,jpp^(2^) (in 
light of the latter’s <((, -hardness). So, for all a;i,X2 G S*: if h{{x\,xf)) = 
(yi,?/2), then (a;i G Lsp_^ ^ x^ ^ -bDiFF^^^lEj)) if and only if {yi G 
V2 ^ iDiFF„,(sJ))- Equivalently, for all xi,X2 G S*: if h{{xi,X2)) = {yi,V2), then 

{x\ G X2 & Lpijpp^(sP)) iff (yi G Lai_^ ^ y2 & Ep)ipp^(2j))- (**) 

We can use h to recognize some of Ldiff,„(eJ) by a DIFFm(S^) algorithm. 
In particular, we say that a string x is easy for length n if there exists a string x\ 
such that \xi \ <n and ixi G L^p 4A yi ^ L^p ) where h((xi,x)) = (yi,y2)- 

Let p be a fixed polynomial, which will be exactly specified later in the 
proof. We have the following algorithm to test whether x G Lpjjpp^j-^p in the 
case that (our input) x is an easy string forp(|a;|). Guess Xi with \xi \ < p(|a;|), let 
h{{xi,x)) = {yi,y2)i and accept if and only if (a;i G L^p yi ^ L^p ^) and 
j/2 G Lpijpp^(2P)S Tbis algorithm is not necessarily a DIFFm(S^) algorithm, 
but it does inspire the following DIFFm(S^) algorithm to test whether x G 
Ldiff,„(sJ) ia the case that x is an easy string for p(|x|). 

Let Li,L 2, ■ ■ ■ ,Lm be languages in such that Ldiff„(eJ) = Li ~ {L2 — 
(L3 - • • • {Lm-i - Lm) • • ■)) and Li A L2 A • • • D L^-i A Lm (this can be 
done, as it is simply the “telescoping” normal form of the levels of the boolean 
hierarchy over S^, see jtillHldOj L For 1 < r < m, define L(, as the language 

® By Stockmeyer ’s m standard quantifier characterization of the polynomial hierar- 
chy’s levels, there do exist sets satisfying this definition. 

^ To understand what is going on here, simply note that if (a;i G L^^p 4^ yi ^ L^p ) 

fc — 1 fc — 1 

holds then by equation (**) we have x G 1/2 G Note also 

that both of xi G L-^p and yi ^ L^p can be very easily tested by a machine 

fc — 1 fc — 1 

that has a oracle. 
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accepted by the following machine: On input x, guess x\ with \x\ \ < p(|a:|), 
let h((xi,x)) = (yi,y2), and accept if and only if ixi G ^ yi ^ L^p ) 

and y2 & Lr- 

Note that G for each r, and that O O • • • D O L'^. We 

will show that if x is an easy string for length p(|x|), then x G ADiFFm(s^) if and 
only if X G L[ - {L'^ - (L'g ~ L'^) ■ ■ •))• 

So suppose that x is an easy string for p(|x|). Define r' to be the unique 
integer such that (a) 0 < r' < m, (b) x G for 1 < s < r', and (c) x ^ L'^ for 
s > r' . It is immediate that x £ L[ — (L^ — (^3 — • • • (^^-1 ~ ^m) ’ ’ ’)) if and 
only if r' is odd. 

Let w; be some string such that (3xi G (S*)-Pil'^li)(3j/i)[/i((xi, x)) = {yi,w)A 
(xi G j ^ 2/1 ^ L/^p ^)], and w G Lr> if r' > 0. Note that such a w exists, 
since x is easy for p(|x|). By the definition of r' (namely, since x ^ for s > r'), 
w ^ Lg for all s > rb It follows that w G fjDiFFm(sj) if and only if r' is odd. 

It is clear, keeping in mind the definition of h, that x G iff 

w G iDiFFm(Ej)j ^ iff "I"' is odd, and r' is odd iff x G L'^ — 

{L'2 — (L3 — • • • — L'^) ■ ■ •)). This completes the case where x is easy, as 

L'l — {L'2 — (L3 {L'^_i~ L'^) • • •)) in effect specifies a DIFFm(S^) algorithm. 

We say that x is hard for length n if |x| < n and x is not easy for length n, 
i.e., if |x| < n and for all xi with |xi| < n, (xi G L-^p ^ 4 ^ yi G L^^p ^), where 



h((xi,x)) = (2/1, 2/2)- Note that if x is hard for p(|x|), then x ^ L[. 

If X is a hard string for length p(|x|), then x induces a many-one reduction 
/ \<p(kl) 

from to L^p_^, namely, /(xi) = yi, where h((xi,x)) = (2/1, 2/2)- 

(Note that / is computable in time polynomial in max(|x|, |xi |).) So it is not hard 
to see that if we choose p appropriately large, then a hard string x for p(|x|) 
induces algorithms for (Li)^l^l , (L2)^l^l, . . . , (essentially since 

each is in = NP^'=-i, L^p ^ is <((j-complete for and NP^'^-i = 

which we can use to obtain a DIFFm(S^_3) algorithm for Tdiff,„(eJ)j thus 



certainly a DIFFm(S^) algorithm for 



/ \ =|a:| 



However, there is a problem. The problem is that we cannot combine the 
DIFFm(S^) algorithms for easy and hard strings into one DIFFm(S^) algorithm 
for Tj3jff^( 2P) that works all strings. Why? It is too difficult to decide whether 
a string is easy or hard; to decide this deterministically takes one query to S^, 
and we cannot do that in a DIFFm(S^) algorithm. This is also the reason why 
the methods from failed to prove that if PAS2 = NPAS2, then = H^. 
Recall from the introduction that the latter theorem was proven by Buhrman 
and Fortnow |5|. We will use their technique at this point. The following lemma, 
which we will prove after we have finished the proof of this theorem, states a 
generalized version of the technique from [S]. It has been generalized to deal with 
arbitrary levels of the polynomial hierarchy and to be useful in settings involving 
boolean hierarchies. 
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Lemma 3. Let k > 1. For all L £ there exist a polynomial q and a set 
L G such that 

1. for each natural number n' , q{n') > n' , 

2. L C L, and ^ 

3. if X is hard for < 7 (|a;|), then x G L iff x G L. 

Due to space limitations, we refer the reader to the full version of this pa- 
per for the proof of Lemma 0[ia. From Lemma |3 it follows that there exist 
sets Li, L 2 , ■ ■ ■ , Lm G and polynomials 91 , ( 72 , ■ ■ • , 9m with the following 

properties for all 1 < r < to: 

1. Lr C Lr, and ^ 

2. if X is hard for qr{\x\), then x G itL x G Lr- 

Take p to be an (easy-to-compute — we may without loss of generality require 
that there is an i such that it is of the form + i) polynomial such that p is at 
least as large as all the g^s, i.e., such that, for each natural number n' , we have 
p{n') > max{ 9 i(n'), . . . ,qm{n')}. By the definition of hardness and condition 0 
of Lemma|3 if x is hard for p(|a;|) then x is hard for 9 r(|a;|) for all 1 < r < to. 
As promised earlier, we have now specified p. Define Ldiff,„(e^) follows: On 
input x, guess r, r even, 0 < r < to, and accept if and only if both (a) x G Lr or 
r = 0, and (b) if r < to then x G Lr+i- Clearly, Ldiff,„(sO ^ addition, 

this set inherits certain properties from the LrS. In particular, in light of the 
definition of LDiFFm(sO> definitions of the Lj.s, and the fact that: 

X G some even r, 0 < r < to, we have: {x G Lr or 

r = 0) and (x G Lr+i or r = to), 

we have that the following properties hold: (1) Lp,ipp ^(20 C LDiFFm(sj)j ^ind 

(2) if X is hard for p(|a;|), then x G Ldipp^(sp) iff a; G Ldiff„,(eO- 

Finally, we are ready to give the algorithm. Recall that L[,L' 2 , ■ ■ ■ L'r^ are sets 
in such that: (1) L'l D L 2 D • • • D Lm-i 2 ^mi (2) if x is easy for p(|a;|), 
then x G Ldiff„,(eO if and only if a; G L[-{L' 2 ~{L'^ {L'm_i- L'^) ■ ■ •)), and 

(3) if x is hard for p(|a;|), then x ^ L[. We claim that for all x, x G Ldiff„(ep) 

iff a: G {L[ U Ldiff„,(sJ)) ~ (A ~ (f -3 ~ L'm) ‘ ‘ •)). which completes 

the proof of Theorem ^ as is closed under union. 

(^): If X is easy for p(|a:|2, then x G L[ - {L'^ - (^m-i “ f-m) ’ ’ O). 

and so certainly x G (L'l U Ldiff„(sJ;)) - (f -2 “ (f -3 ~ L'^) ■ ■ ■))■ K 

X is hard for p(|a;|), then x G Lpupp^^^*’) and x ^ L'r for all r (since x ^ L'^ and 
L'l D L '2 D • • •). Thus, X G (Z/jULpiipp^(sP)) — (^2 — (L 3 — • • • {L'^_^ — L'^) ■ ■ •)). 

(<=): Suppose x G {L[ U Ldiff„,(ep) - (^2 “ (^3 “ ^m) ' ' •))• ^ 

^ G Lp)ipp^( 2 j)) then ^ ^ LDiFFm(sJ)- If ^ '^diff„,(sJ)) then x G L\ — {L '2 — 
(Lg — • • • {L'm_i — L’rff} ■ ■ •)) and so x must be easy forp(|a;|) (as x G L[, and this is 
possible only if x is easy for p(|a;|)). However, this says that x G Ldiff,„(eJ)- I 



278 



Edith Hemaspaandra, Lane A. Hemaspaandra, and Harald Hempel 



4 Conclusions 

We have proven a general downward translation of equality sufficient to yield, 
as a corollary: 

Corollary 2. For each m > 0 and each k > 1 it holds that: 

= P^Vi-tt ^ DIFF^(SP) = coDIFF^(S^). 

Corollary 13 itself has an interesting further consequence. From this corollary, 
it follows that for a number of previously missing cases (namely, when m > 1 and 

k = 2), the hypothesis FJltt = P^+i-tt implies that the polynomial hierarchy 
collapses to about one level lower in the boolean hierarchy over than could 
be concluded from previous papers. This is because we can, thanks to Corol- 
2 ? 

laryH when given PJltt = Pm+i-tti invoke the powerful collapses of the poly- 
nomial hierarchy that are known to follow from DIFFm(S^) = coDIFFm(S^). 
Regarding what collapses do follow from DIFFm(S^) = coDIFFm(S^), a long 
line of research started by Kadin and Wagner a decade ago has studied that, 
and the strongest currently known connection was recently obtained, indepen- 
dently, by Hemaspaandra et al. and by Reith and Wagner [HI22I, namely, they 
proved: For all to > 0 and all A: > 0, if DIFFm(S^) = coDIFFm(S|) then 
PH = DIFFj„(S^)ADIFFm-i(S^_,_^). Putting all the above together, one sees 

2 ^ 2 ^ 

that, for all cases where to > 1 and fc > 1, P^n-tt = Pm+i-tt implies that the 
polynomial hierarchy collapses to DIFFm(S|)ADIFF, 7 i_i(EP_,_^); of course, for 
the case to = 1, we already know f I hpiij that, for A: > 1, if P^^i] = then 

^P=Ul= PH. 

Acknowledgments: We thank the referees for helpful comments. 
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Abstract. We show that if an NP-complete set or a coNP-complete 
set is polynomial-time disjunctive truth-table reducible to a sparse set 
then FPf]'P = FP^^jlng], Similarly, we show that if SAT is O(logn)- 
approximable then FP||^ = FP^^[log]. Since FP[f^ = FP^^[log] implies 
that SAT is 0(log n)-approximable IHFTO71 . it follows from our result 
that these two hypotheses are equivalent. We also show that if an NP- 
complete set or a coNP-complete set is disjunctively reducible to a sparse 
set of polylogarithmic density then, in fact, P = NP. 



1 Introduction 

The study of the existence of sparse hard sets for complexity classes has occu- 
pied complexity theorists for over two decades. The first results in this area were 
motivated by the Berman-Hartmanis isomorphism conjecture iHTTTT! and by the 
study of connections between uniform and nonuniform complexity classes [EESn!. 
The focus shifted to proving, for various reducibilitie^ (whose strengths lie be- 
tween the many-one and the Turing reducibility), that P = NP is equivalent to 
SAT being reducible to a sparse set via such a reducibility. It is now known (see 
the recent survey USM!) for several reducibilities that P = NP is equivalent to 
SAT being reducible to a sparse set via such a reducibility: a well-known exam- 
ple here is the result for the case of bounded truth-table reducibility |()W91j . 
However, it remains a challenging open problem to prove that P = NP if there 
is a sparse Turing-hard set for NP. Indeed, this question remains open even for 
reducibilities stronger than the Turing reducibility. 

In this paper we consider the question of existence of sparse hard sets for NP 
w. r. t. disjunctive truth-table reductions. We briefly recall some known results: 
it is shown in |AK1VI96| that if there is a sparse hard set for NP under disjunc- 
tive reductions then PH collapses to More recently, it is shown in 

^ All reducibilities considered in this paper are polynomial-time computable. 
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that if there are sparse hard sets for NP under the disjunctive reducibility then 
RP = NP. The proof technique in is based, in turn, on powerful alge- 
braic techniques from tailored for application in the area of reductions to 

sparse sets. With these techniques some long standing conjectures of Hartmanis 
regarding logspace and NC^ reductions to sparse sets have recently been settled 
(the survey KX)97I contains a nice overview of these results) . 

A Summary of the Results 

The main contribution of this paper is to relate the question of the existence of 
sparse hard sets for NP under disjunctive reductions to other, apparently differ- 
ent, hypotheses in complexity theory considered in the recent work of Buhrman 
et. al. Ensa. Among these are the following. 

(1) P = NP. 

(2) FP|l'P = FP'^P[log]. 

(3) SAT is O(logn) approximablefl 

(4) (1SAT,SAT) has a solution in P. 

Clearly, hypothesis (1) implies the others. Furthermore, it is shown that 
(2) implies (3) in ftif'ThTj. More recently, Sivakumar |Sih8) . using algebraic 
techniques from !ALH,S02j . has shown that (3) implies (4). 

It is known that RP = NP follows from (4) [VVSfi) . and it is an outstanding 
open problem in structural complexity whether P = NP follows from any of the 
hypotheses (2), (3) or (4). This is the main motivation for studying them. Cai, 
Naik and Sivakumar (in the technical report version of jCNS96j l have shown 
that if SAT is disjunctively reducible to a sparse set then hypothesis (4) holds. 
Building on the technique of jCNS96j we prove the following new results: 

— If SAT or SAT is disjunctively reducible to a sparse set then FPj^^ = 
FP^^^llog].^ 

— For any prime k, if Mod^P is disjunctively reducible to a sparse set then 
(1SAT,SAT) has a solution in P. 

There are collapse results that follow from hypothesis (2) that are not known 
to follow from (4). For example, in [.i r95j it is shown that if FPj^^ = FP^^[log] 
then a polylogarithmic amount of nondeterminism can be simulated in polyno- 
mial time. Combining this with our result yields as corollary that if SAT or SAT 
is disjunctively reducible to a sparse set of polylogarithmic density then P = NP. 
With related techniques we observe some consequences of SAT being majority 
reducible to a sparse set that are discussed in the last section of the paper. 

With arguments similar to those we use for proving the above results on 
sparse sets, but now combined with Sivakumar’s technique used in |Si98j . we 
show: 

— If SAT is O(logn) approximable then FP||^ = FP^^[log]. 

^ SAT is the canonical NP-complete set, defined as the set of satisfiable Boolean for- 
mulas. 
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This proves that hypotheses (2) and (3) are equivalent, answering an open 
question in |Bh’'r97| . From these results we conclude that both these hypotheses 
are at least as weak as SAT being disjunctively reducible to a sparse set. 

2 Preliminaries 

We fix the alphabet S = {0,1}. The set Uo<i<n^* strings in S* of 

length up to n is denoted by A'-". For any set A C E*, = A C\ 

and A^^ = A C] Z'". xa denotes the characteristic function of A. By abuse 
of notation, let xa(xi,X 2 , ■ ■ ■ ,Xm) denote the function that maps the list of 
strings Xi,X 2 , ■ ■ ■ ,Xm to the m-bit vector whose ith bit is XA(xi). The length 
of a string x is denoted by |a;|, and the cardinality of a set A is denoted by 
||A||. The density function of a set A is defined as d^(n) = ||A-”||. A set S 
is sparse if its density function is bounded above by a polynomial. A sparse 
set has polylog density if its density function is bounded above by log^ n for 
some constant k > 0. The complement of a language A is denoted by A. Let 
(•, •) denote a standard polynomial-time computable, one to one and polynomial- 
time invertible pairing function which can be extended in a standard fashion to 
encode arbitrary sequences (a;i , . . . ,Xk) of strings into a string {x\, . . . , Xk)- 
All reducibilities in this paper are polynomial-time computable. Apart from 
the many-one reducibility, we consider the disjunctive truth-table reducibility: 
A set A is disjunctively reducible to a set B, if there is a polynomial-time com- 
putable function / mapping strings to sets of strings such that for all x G E* 
it holds that x G A <^==> f(x) f] B ^ ttt. Let SAT denote the set of satisfiable 
Boolean formulas. We next define promise problems. 

Definition 1. |FSY84| A promise problem is a pair of sets (Q,R). A set L is 
called a solution of the promise problem (Q,R) if for all x G Q , x G L x G R. 

Of particular interest to us is the promise problem (ISAT, SAT), where ISAT 
contains precisely those Boolean formulas which have at most one satisfying 
assignment. Observe that any solution of the promise problem (ISAT, SAT) has 
to agree with SAT in the formulas having a unique satisfying assignment as well 
as in the unsatisfiable formulas. We next recall the definition of approximable 
sets0 

Definition 2. IBKS95I A function g is an f -approximator for a set A if for 
every X\,X 2 , ■ ■ ■ , Xm with m > f(maxi\xi\), 

g{xi,X2 , . . . , Xm ) G E'^, and 

g{xi,X2 ) ^ Xa(Xi,X2, . . .,Xm)- 

A set A is called f -approximable if it has an f -approximator. 



® Approximability is called membership comparability in |Ug95|. 
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FPj^^ denotes the class of functions computable in polynomial time with 
parallel queries to an NP oracle and FP^^[log] denotes the class of functions 
computable in polynomial time with logarithmically many adaptive queries to 
an NP oracle. 

Other notions from complexity theory used in this paper can be found in 
standard textbooks like |HI )088l IPah4j . 

3 Sparse Sets and Parallel Queries to NP 

As preparation for the first result we prove the following lemma which is essen- 
tially implicit in |( 1NS96| . We are interested in solving the following decoding 
problem which we call the hidden polynomial problem: We are given as input a 
prime q, integers t, n and N such that q > {n+ l)t, and i?i, i? 2 , • ■ ■ , Rn, where 
Ri C FqX Fq with the promise that there exists a polynomial P over Fq of degree 
at most n and a subset I C {1, 2, . . . , of size t such that Ui^iRi = graph(P). 
We want to compute a small set of polynomials over Fq containing P (with each 
polynomial output as a list of its coefficients). 

Henceforth, we refer to the collection i?i, i? 2 , . . . , Rn as a table and the Ri’s 
as its rows. Observe that the rows do not have to be all of the same size. We call 
a row Ri correct w. r. t. the polynomial P ii Ri C graph(P). 

The following lemma (based on p01NS9b| ~l gives a precise answer to the above 
decoding problem. 

Lemma 1. There is an algorithm (that runs in time polynomial in n, N, and 
q) that takes as input q, t, n, N , and a table T = {i?i, i? 2 , . • . , Rn} of N rows as 
described above, and outputs a list of at most N polynomials, one of which is the 
hidden polynomial. 

Proof. Notice that there are exactly q^ pairs (u,v), u,v G Fq of which exactly 
q pairs correctly define the graph of the hidden polynomial P. Since there is a 
set of t correct rows in the table T which completely specify the polynomial, by 
pigeon-hole principle there is one correct row which contains at least q/t pairs. 
Furthermore, notice that no correct row contains inconsistent pairs (u, v) and 
(u, w) where v ^ w. 

Call a row of the table long if it has at least q/t pairs and does not contain 
any inconsistent pair. We know that there is at least one correct row which is 
long. 

Writing the hidden polynomial P{x) as a.ix'^ we notice that each long 

row gives us a system of at least q/t linear equations in the n -I- 1 unknowns 
ai,0 < i < n. Since q/t > n -I- 1, we can pick any n -I- 1 of the equations 
corresponding to a given long row which will have a unique solution in the afs 
since the coefficient matrix is a Vandermonde matrix which is invertible. Using 
Gaussian elimination we can efficiently compute this unique solution for each 
long row. 

This yields a list of at most N polynomials, one for each long row in the 
table T, and we know that the hidden polynomial (corresponding to a correct 
long row of T) is in this list. 
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We prove now the first result of the paper. 

Theorem 1 . If SAT is disjunctively reducible to a sparse set then FPj^^ = 
FpNP[log], 

Proof. We show that the function xsat> which computes the characteristic se- 
quence of a list of SAT queries, can be computed in FP'^^llog] under the assump- 
tion that SAT is disjunctively reducible to a sparse set. Since xsat is complete 
for FPjl^ , this clearly proves the equality FPj^^ = FP^^ [log] . 

We now design an FP^^[log] machine M for xsat- Given a list of formulas 
{xl^X2^ ■ ■ ■ , Xm) as input, the machine M first computes, with a binary search 
and queries to a suitable NP oracle, the cardinality k of {x\,X2, ■ . ■ , Xm} H SAT. 

Now consider the following set Y = {{q,u,v,k,Xi,X2, ■ ■ ■ ,Xm) | 0 < < 

q — 1 , and 3 a G A"* with k I’s such that if = 1 then Xi G SAT and 
YT=i = V (mod g)}. 

Notice that Y G NP. Also, observe now that if k is T2, . . . , Xm} H SATjj 
then there is a unique vector a G A"* such that if Oi = 1 then Xi G SAT. Thus, 
for a given triple q, u, v there is at most one vector a G A™ satisfying the above 
property. 

Actually, we are interested only in those instances (g, u,v,k,Xi,X2, ■ ■ ■ , Xm) of 
Y where g is a small prime. More precisely, consider an instance (xi, X2, ■ ■ ■ , Xm) 
of XSAT of length n. Corresponding to this instance, we pick g to be a clogn 
bit prime number, where we will choose c later appropriately. Let Fq denote 
the finite field of size g. Notice that we can pick g and construct the field Fq 
efficiently (i.e. in time polynomial in n). Moreover, arithmetic in Fq can also be 
done efficiently. 

Since Y G NP there is a disjunctive reduction / from A to a sparse set S of 
density || 5 '-"'|| < p{n) for some polynomial p. I.e. / is an FP function that on in- 
put X produces a set of strings f{x) such that a; G A iff f{x)r\S 0 . The length of 
(g, u,v,k,X\,X2, ■ ■ ■ ^ Xm) using a standard pairing function can be bounded by 2 n 
for large enough n since q,u,v and k can be encoded in 3 clogn-|-logn bits. Now, 
since the reduction / from A to S' is polynomial-time computable there is a poly- 
nomial r(n) which bounds both ||/((g, u, u, /c, Si, a;2, . ■ . , a;m))|| and the length 
of each query in /((g, u, v, fc, xi, X2, ■ ■ ■ , Xm))- Our aim is to apply LemmaDl Let 
Q = Uu.-„gf, /((?>'“> 2^2, Write Q = {gi, g2, . . . , gAr}. Build a 

table T with N rows Ri where (it, i>) is in Ri if g^ G /((g, it, u, k, X\,X2, ■ ■ ■ , Xm))- 
Note that N < r{n)q^ . We define row Ri to be correct if qi G S. Notice that 
there are at most ||S-'^ 0 )|| < r{n)p{r{n)) correct rows in T. Now we choose the 
constant c (which determines the size of the field Fq) so that q/r{n)p{r{n)) = 
n'^ /r(n)p{r(n)) > n, where we know that n > m. 

Let XSAT(a:i, X2, • • • , Xm) = 0i02 ■ ■ • Om- Then we claim that a hidden poly- 
nomial specified by T is To see that the conditions of Lemma [D 

are fulfilled, notice that each Ri is indeed correct w. r. t. this polynomial and 
U(jiGS contains precisely the g pairs (u, v) such that = v (mod g). 

Applying the algorithm of Lemma [D we can compute in time polynomial in n 
a list X of at most N polynomials of degree m — 1 . Each of these polynomials 
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gives us an m-bit vector of its coefficients. We discard from this list those m-bit 
vectors which have a number of I’s different from k. In the pruned list exactly 
one m-bit vector is AsAT(a^i, 2 : 2 , • ■ • , x„i) and every other m-bit vector has a 1 at 
a position where the corresponding formula in (xi,X 2 , ■ ■ ■ , Xm) is unsatisfiable. 

The FP'^^llog] machine M can now find the unique correct m-bit vector in 
X by doing a standard binary search guided by at most log N = 0(log n) queries 
to a suitable NP oracle. 

It is an open question if P = NP can be derived from the assumption that SAT 
is disjunctively reducible to a sparse set. A first step here is to consider sparse 
sets of density lower than polynomial. It is known that if SAT is disjunctively 
reducible to a tally set then P = NP |lJk8dl IYa8d| . However, the techniques 
of flJk88L IYa88| do not work if we assume that SAT is disjunctively reducible 
to a set S of, say, polylog density. Reductions of SAT to sets of polylog density 
were considered by Buhrman and Hermo where they showed that if SAT 

is Turing reducible to a set of poly log density then NP(log^ n) = NP for all 
k (where NP(log^n) is the class of NP languages accepted by NP machines 
which make at most log^ n nondeterministic moves on inputs of length n) . We 
recall here the result of Jenner and Toran [Trn^ that if FP[[P = FP”^[log] 
then NP(log^ n) = P for each k > 0. Combining the above-mentioned results 
of fBHMrTT^ with Theorem [n immediately yields the following corollary. 

Corollary 1. If SAT is disjunctively reducible to a set of polylog density then 
P = NP. 

The question whether SAT is conjunctively reducible to a co-sparse set is, 
by complementation, equivalent to SAT being disjunctively reducible to a sparse 
set. We show that we can apply again the technique of jCJNS96j to derive FP||*^ = 
FpNP ^ ^ consequence. 

Theorem 2. If SAT is disjunctively reducible to a sparse set then FPj^^ = 
FP'^Ppog]. 

Proof. Suppose SAT is disjunctively reducible to a sparse set. It suffices to show 
that xsAT is in FP^^[log]. Let {xi,X 2 , ■ ■ ■ , Xm) be an instance of xsat- We can 
assume w.l.o.g. that the variable sets of the formulas in (xi,X 2 , ■ ■ ■ ,Xm) are 
pairwise disjoint. On input (xi,X 2 , ■ ■ ■ ,Xm), k = ||{a;i, a; 2 , . . . , Xm} C SAT|| is 
first computed with an FP^^[log] computation. 

We introduce some notation. Let w denote an assignment to all variables in 
{xi, a; 2 , • ■ ■ , Xm}- Let Xi{w) denote the value of formula Xi at w. We define a new 
predicate U{{q, u, v, k, xi,X 2 , ■ ■ ■ , Xm), a, w) as follows. It is true if and only if: 

m m 

Gi = k A /y Xi{w) = 1) = n(mod q) 

i—1 ai — 1 i—1 

Notice that U is polynomial-time computable. Consider now the following 
set 
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Z = {{q, u, V, k, xi,X2, ■ ■ ■ , Xm) I 0 < u, < 9— 1 , and Va S 17 ™ V assignments 
w: U{{q, u, V, k,xi,X2,-.., Xm),a, w)}. 

Since is a polynomial-time predicate it follows that Z G coNP. 

Observe that if fc = \ \{xi,X2, • ■ ■ , Xm} H SAT| | then at = 1 implies Xi G SAT 
for all z iff a e A™ is the characteristic vector of X\,X2, ■ ■ ■ ,Xm- We are interested 
in instances (g, u, v, k, X2, • ■ ■ , Xm) of Z for q picked to be a small prime. If 
|(a;i,a;2, ■ ■ • ,a;m)| = n, we will pick g to be a clogn bit prime number, for an 
appropriate c. 

Since Z G coNP there is a disjunctive reduction / from A to a sparse set S' 
of density ||S-"|| < p(n) for some polynomial p. I.e. f is an FP function that 
on input x produces a set of strings f{x) such that a; S A iff /(cc) n S 7^ 0 . As in 
the proof of Theorem Q] \{q,u,v,k,xi,X2, ■ ■ ■ , Xm) \ can be bounded by 2 n. There 
is a polynomial r such that r(n) bounds both \\f{{u,v,k^xi,X2, ■ ■ ■ ,Xm))\\ and 
the length of each query in /((g, u,v,k,x\,X2, ■ ■ ■ ^ Xm)) for u,v G Fq. 

The crucial property that we exploit is the following claim that is easy to 
check from the definition of Z . 

Claim If k = \ \{xi,X2, ■ ■ ■ ^ Xm} H SAT|| then (g, u,v,k,xi,X2, ■ ■ ■ , Xm) G Z iff 
Sill = u(mod g) holds for a = xsAT(a;i, a;2, ■ • -,Xm)- 

Let Q = U«.«GF, f{{Fu,v,k,xi,X2j ■ ■ .,Xm)) and write Q = {gi,g2, • ■ ■ ,gAf}. 
In order to apply Lemma Q we build a table T with N rows Ri and put (m, v) in 
row Ri if qi G f{{q, u, v, k, x\, X2, ■ • ■ , Xm))- Note that N < r{n)q^ . We define row 
Ri to be correct if g^ G S. Notice that there are at most ||S-’'("^|| < r(n)p(r(n)) 
correct rows in T. Choose the constant c (which determines the size of the field 
Fq) so that q/r(n)p{r(ri)) = n'^ / r{n)p{r(n)) > n, where we know that n > m. 

As in the proof of Theorem E we can find a list of at most N m-bit vectors 
one of which is xsat(xi,X 2, ■ ■ ■ ,Xm), which we can locate by doing a binary 
search with an FP^^[log] computation. 



Corollary 2 . If SAT is disjunctively reducible to a sparse set then there is a 
solution of (ISAT, SAT) in P, and consequently NP = RP. 

We next consider disjunctive reductions from ModfcP to sparse sets. We first 
recall the definition of ModfcP. For an NP machine N let accN{x) denote the 
number of accepting computations of N on input x G A*. A language L is in 
the class ModfcP if there is an NP machine N such that x G L accAr(x) ^ 
0(mod k). 

Theorem 3 . For prime k, if a ModkR -complete set is disjunctively reducible to 
a sparse set then there is a solution 0/ (ISAT, SAT) in P. 

Proof. We sketch the proof. Details are as in the proof of Theorem [U Let Y be 
the language consisting of tuples (g, u, v, F) satisfying: 

— A is a Boolean formula and 0 < u, u < g — 1 , for nonnegative integers u and 
V, and positive g. 

— If A is m-ary then ||{a G A™ | A(a) = 1 and XIHi = v (mod g)}|| ^ 

0 (mod k). 
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It is easily seen that V is in Mod^P. Furthermore, Y has the following useful 
property which is easy to check from its definition. 

Claim: If F is an instance o/(lSAT,SAT) then (q,u,v,F) gY ijf F is satis- 
fiable and = v (mod q) holds for the unique satisfying assignment 

a. 

Given an instance F of (ISAT, SAT) of length n, we pick a clogn bit prime 
number q (for appropriate c) and consider the disjunctive reduction / from Y to 
a sparse set S, as applied only to instances {q,u,v, F) , for u,v G Fq. As in the 
proof of Theorem^ we can choose the constant c appropriately large (depending 
on density of S and the reduction /) so that Lemma Q can be applied to yield 
a polynomially bounded set of vectors, one of which is the unique satisfying 
assignment of F assuming F is satisfiable. Thus, we can decide satisfiability for 
instances of (ISAT, SAT). 

4 Approximability and Parallel Queries to NP 

In this section we show that if SAT is 0(log n)-approximable then FPj^^ = 
FpNPjiog]^ We prove this result by applying the main technical idea in [Sif)8| 
where he shows that if SAT is O(logn) approximable then the promise problem 
(ISAT, SAT) is in P0 

Theorem 4. If SAT is O(logn) approximable then FP|]*^ = FP'^^[log]. 

Proof. Assume SAT is O(logn) approximable. In other words, there are a con- 
stant c and an FP function / such that f{{xi,X 2 ,-..,Xk)) is a fc-bit vector 
different from xsAT(a:i, 2 : 2 , • ■ ■ , Xk) for any fc-tuple of formulas {xi,X 2 , . ■ • , Xk) 
with k > clog(maa;i|a:i|). 

As before, we’ll prove the result by giving an FP'^^[log] machine, call it M, 
that computes xsat- Let {xi, X 2 , ■ ■ ■ , Xm) be an input instance for xsat- The 
first step of M is to compute via a binary search guided by a suitable NP oracle 
the number k of the xfs that are in SAT. As in the previous proof we will pick 
a suitable constant c' and efficiently construct the finite field Fq, where g is a 
c'logn bit prime number. For a binary vector a = aia 2 ...am G if™ let Pa 
denote the univariate polynomial over Fq. We define the following 

new language: Z — {{u,j,k,x\,X 2 , . ■ . ,Xm) \ 3a G if™ with k I’s such that if 
Oi = 1 then Xi G SAT and the jth bit of Pa{u) is 1}. 

Clearly, this language is in NP and is therefore O(logn) approximable by 
hypothesis. The next two technical steps are exactly as in fSi98j . 

It is not hard to see that we can apply ISM Corollary 3] to get in polynomial 
time for each u G Fq a set Su Q Fq such that ||S'„|| < q^^^ and Pa{u) G Su. 
Next, applying jAljR,Sh‘i) (as described in |Sih8j l we can efficiently reconstruct 
a list of N polynomials of degree m — 1 that includes Pa and N is bounded by 
a polynomial in n, the length of {xi,X 2 , . . . , Xm)- 

^ Both )Si98) and [BFT97j call the promise problem UniqueSAT which can be confnsed 
with USAT. In this paper we have nsed Selman’s notation as in 
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We can recover from this list of N polynomials a list of N m-bit vectors. 
Afterwards we discard those vectors which contain a number of I’s different 
from k, we know that except xsat(xi,X 2 , ■ ■ ■ ,Xm) which is in this list every 
other m-bit vector has a 1 in a position where the corresponding formula in 
(xi,X 2 , • ■ • , Xm) is unsatisfiable. 

As described in the earlier proof we can find this vector by doing a binary 
search with at most log N queries to a suitable NP oracle. 

In |BFT97| it is shown that if FPj|^ = FP^^[log] then SAT is O(logn) 
approximable. Combined with the above result and with Theorems EllH and0 
we have the following corollaries. 

Corollary 3. SAT is O(logn) approximable ijff FPn’^ = FP^^[log]. 



Corollary 4. If SAT or SAT is disjunctively reducible to a sparse set then SAT 
is O {log n)- approximable. 



5 Majority Reductions to Sparse Sets 

We have a couple of observations regarding majority reductions to sparse sets. 
Majority reductions to sparse sets are interesting because they generalize both 
conjunctive and disjunctive reductions to sparse sets. A set A is majority re- 
ducible to a set B if there is an FP function / that on input x produces a set 
of strings f{x) such that a: S A iff ||/(x) C i?|| > ||/(x)||/2. In other words, the 
majority of strings in f{x) are in B. By padding the list of queries generated 
by / with a suitable number of strings (we pad with copies of either a fixed 
string in B or a fixed string outside B), it follows that if A is conjunctively or 
disjunctively reducible to B, then A is also majority reducible to B. In [( ;NS9ti) 
randomized &pp-reductions are considered, and it is shown that if SAT is bpp- 
reducible to a sparse set then NP = RP. It is also easy to see that if A is majority 
reducible to B then, in fact, A is &pp-reducible to B, with the reduction having 
success probability 1/2-1- Combining this with the above stated result 

of |CNS96| we get the following corollary. 

Corollary 5. //SAT is majority reducible to a sparse set then NP = RP. 

We leave as an open question whether FP|^^ = FP^^[log] or the weaker 
consequence that (ISAT, SAT) has a solution in P, can be proved assuming that 
SAT is majority reducible to a sparse set. And finally, the main open problem is 
to show that P = NP if SAT is disjunctively reducible to a sparse set. We note 
that it is open if P=NP can be derived from the stronger assumption that the 
disjunctive reduction generates only a poly logarithmic number of queries. 

Acknowledgments. We thank the referees for useful comments. 
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Abstract. Sequential selection has been solved in linear time by Blum e.a. Run- 
ning this algorithm on a problem of size N with N > M, the size of the main- 
memory, results in an algorithm that reads and writes 0{N) elements, while the 
number of comparisons is also bounded by 0{N). This is asymptotically opti- 
mal, hut the constants are so large that in practice sorting is faster for most values 
of M and N. 

This paper provides the first detailed study of the external selection problem. A 
randomized algorithm of a conventional type is close to optimal in all respects. 
Our deterministic algorithm is more or less the same, but first the algorithm builds 
an index structure of all the elements. This effort is not wasted: the index structure 
allows the retrieval of elements so that we do not need a second scan through all 
the data. This index structure can also be used for repeated selections, and can be 
extended over time. For a problem of size N, the deterministic algorithm reads 
N + o{N) elements and writes only o{N) elements and is thereby optimal to 
within lower-order terms. 



1 Introduction 

Selecting an element x with specifed rank n from a set of size is a problem that has 
attracted a substantial amount of interest. The first deterministic sequential algorithm 
running in 0{N) time was presented in 1972 by Blum e.a. @. Later algorithms have 
tried to reduce the leading constant. Currently, the best results are the 2.95- algorithms 
by Dor and Zwick 01 and Carlsson and Sundstrom 0?]. 

What happens if N exceeds the size of the main-memory M? In other words: how 
about selection as an external-memory problem? The algorithm from @ and most later 
algorithms work by repeatedly reducing the problem size by a constant factor, each of 
these reductions requiring a few passes over the data. So, summing over all reductions, 
this requires a constant number of passes over the data. The number of comparisons is 
the same as before. Thus, these sequential algorithms lead to external algorithms that 
read and write 0{N) elements, while performing only 0{N) comparisons. Asymptot- 
ically this is optimal. However, our estimate in Section o shows that the algorithm 
performs up to 8 • A^ reads and 4 • N writes. Considering that I/O operations are much 
more expensive than internal operations, this algorithm will be faster than a multiway- 
merge sorting algorithm (which requires A^ • (1 -F [log A^/ log M]) reads and writes) 
only for extreme values of N/M. 

The first question we wanted to address was whether in practice one can perform 
external selection faster than sorting. The answer is affirmative: a simple randomized 
algorithm requires only N -F o{N) reads and o{N) writes with A^ -F n -F o{N) compar- 
isons. This algorithm, which is similar to earlier randomized (parallel) selection algo- 
rithms almost hits the trivial lower bounds. 
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The second question was whether there is a deterministic algorithm with perfor- 
mance close to that of the randomized algorithm. Our basic algorithm performs selec- 
tion either with 2 ■ N + o{N) reads and o{N) writes or with N + o{N) reads and 
N + o{N) writes. The value of the o(7V)-terms depends on the number of comparisons 
we are willing to make: for s < {M ■ log{N ■ the lower-order terms are 

bounded to 0{N ■ log{N ■ s/M"^) / s) with 0(log s ■ N) comparisons. The algorithm is 
based on the refii ed deterministic sampling technique from 0 , which in turn is similar 
to the sampling technique in |0|. Methods of this type have also been applied in HT17II . 

The variant with N + o{N) reads and writes is most interesting, because this variant 
allows for several re&ements . It can be made adaptive: as soon as it can be ruled out 
that an element is a candidate for the element to select, this element is not written 
away. On average-case inputs, this reduces the number of writes from N to at most 
In 2 • TV. Due to space limitations this algorithm is not described here. It will appear 
in the full version. A further refinement gives even better results, thus almost matching 
the performance of the randomized algorithm, and the lower bound, up to lower-order 
terms. 

The deterministic algorithm has an additional advantage: in order to select an ele- 
ment with given rank, it fr st builds an index structure with which it can fii d the element 
with any specifed rank in o{N) time. This means that for repeated selections, it is more 
efficient than repeatedly running the randomized algorithm, which scans through all 
data. Furthermore, if the number of elements continues to grow, this index structure can 
be extended dynamically, so that the total amount of work for constructing it is only 
slightly more than if it had been constructed all at once. 

All algorithms discussed in this paper can be parallelized a good deal, and on sys- 
tems with several hard-discs, the full speed-up can be expected. 

The remainder of this paper is organized as follows: after introducing some elemen- 
tary facts in the following section, we analyze the randomized algorithm in Section 0 
We then present the deterministic algorithm and its reth ements. In Section0we con- 
sider several extensions. 



2 Preliminaries 

2.1 Problem and Goals 

The selection problem is to select an element with a specifed rank from a completely 
ordered set. Throughout this paper we will assume that the size of the set is N, and that 
the rank of the element to select is n < N/2. The goal of most sequential algorithms in 
this feld has been to minimize the number of comparisons performed. 

We are also particularly interested in selection problems for the case in which N 
exceeds the size of the main-memory M. This we call the external selection problem. 
As access to secondary storage is far more expensive than internal operations, a major 
concern of any external algorithm must be to minimize the number of these accesses. 

Secondary storage is organized in pages and every time the program uses a datum 
that does not currently reside in the main-memory, a whole page of size B has to be 
brought into the main-memory. If the main-memory is full, a new page can be brought 
in only if another page is removed. The page to be removed can be selected in several 
ways, but whatever strategy is applied, there are two possibilities: if none of the data 
on this page has been changed, then it can be simply overwritten; otherwise, the page 
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has to be written back to the secondary memory before it can be overwritten. The latter 
is more expensive, but the amount of additional expense depends on the underlying 
mechanism. In our analysis, we do not want to go into the technical details, but we do 
count the number of pages that has to be loaded into the main-memory, load operations, 
and the number of pages that has to be written away, store operations, separately. 

The time required for performing a comparison is denoted by fcomp, that required for 
a load by fioad and that for a store by f store- Our goal is to minimize the number of load 
and store operations, while keeping the number of comparisons as small as possible. 



2.2 Sorting 

In IB, it has been shown that sorting N numbers on a machine with M main-memory 
requires l7(log N/ log M) ■ N/ B paging operations. Sorting can be performed as fol- 
lows: first all subsets of size M are sorted; then an {M/B — l)-way merge is performed 
as often as necessary. In this way sorting costs 

T,on{N) = 0{N ■logN)-t,omp 

+ {1+ \log{N/M)/log{M/B - 1)-]) ■ N ■ {tio.A/B + UojB). 

In all practical cases (those with M > (N/M -f 1) • B), all data are loaded and stored 
exactly twice. 



2.3 Sequential Algorithm 

We summarize the algorithm of |2'l and analyze its quality as an external memory algo- 
rithm. The algorithm consists of the following steps: 

Algorithm SEQUENTIAL_SELECXION(n, a) 

1. Find the median in all groups of size a. 

2. Find the median m of these medians. 

3. Divide the input into two subsets; one whose members are smaller than m, C, 
and one whose members are larger than m TZ. Also determine the size L of C. 

4. If n < L, then recurse on C, else setn = n — L and recurse on TZ. 

If the size of the groups that are sorted together is denoted a, then the I/O-time T{N) 
for a problem of size N can be expressed in terms of the costs of the subproblems: 

T(N) < T(N/a) + T(N - • TV) 

4 • a 

2 ■ N ■ fioad / B -f {N / a -f N) ■ ^store /B. 

In contrast to the sequential algorithm, the optimum is assumed for large a. For such a, 
solving gives 

T{N) = (8 + o(l)) • N ■ tio^/B + (4 + o(l)) • N ■ W/-B. 

For practical values of M and N, sorting will be several times faster. 
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3 Randomized Algorithm 

We give a randomized algorithm for external selection. Algorithms of this type have 
been given before for parallel selection. The most interesting point is the choice of the 
parameters so that the number of loads and stores are minimzed. 

The algorithm is simple: a sample of sufficient size is selected at random; two of the 
elements of the sample are taken as bounds for the set of keys that have to be considered 
in a second round; all elements are traversed and those that lie between the bounds are 
singled out. From now on the algorithm is applied recursively on the reduced set. The 
essential difference with a deterministic algorithm is that a good sample can be selected 
at low costs. For selecting the element with rank n from a set Af of cardinality N the 
algorithm proceeds as follows: 

Algorithm RANDOMIZED_SELECTION(n, S, A) 

1 . Randomly and uniformly select a subset S of cardinality S out of the elements 
of A/”. 

2. Select from S the element with rank [S/N-n\—A/2 and assign it to xiow • Select 
from S the element with rank [S/N ■ n\ + Aj2 and assign it to Xhigh- 

3. Traverse JV and add all the elements I with a;iow < I < a:high to the subset Af'. 
Determine the number N' of selected elements, and the number R of elements I, 
with I > a;high- 

4. Set L = N — N' — R. Select the element with rank n' = n — L from Af'. 

The complexity of the algorithm can be written as follows: 

T{N) = 2-T{S)+T{N') 

+ S ■ fload+'S • tstom/ B+{2 ■ N — R) ■ Tcomp + A • T[o^d/ B + N' ■ tstom/B. (1) 



Lemma 1 If A is taken A = (log N ■ then the element with rank n lies between 

xi„w and Xhigh, with high probability. 

Proof: Denote the (unknown) element with rank n by X. Let X be the random variable 
giving the number of the elements selected in Step 1 that are smaller than x. For each 
element the probability that it is smaller than x equals n/N . Thus, X has an expected 
value of n/N-S. Because all selections are independent, we can apply Chernoff bounds, 
which give 

Pr(A: > n/N ■ S + A/2) < 

For A = (2 ■ log N ■ n/N ■ < (log N ■ the right-hand side is only 

The estimate for a deviation below the expected value goes analogously. □ 

Lemma 2 If A = {logN ■ then A^' = (1 + o(l)) • N ■ (logN/S)^^^, with high 
probability. 

Proof: The expected number of elements of Af between any two elements of S is N/ S. 
Thus, the expected number of elements of N between xiow and a;high is A ■ N/ S = 
N ■ (logfV/S')^/^. Because of the process used in selecting xiow and a;high, the devia- 
tion from the expected value is smaller than if everything were independent. Thus, the 
estimate from the Chernoff bounds gives an upper-bound on the deviation. □ 
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Theorem 1 With S = N ■ and A = (logiV • the given ran- 

domized algorithm solves external selection in 

T(N) = N ■ flood/ B +{N + n) 

^comp 

+ ■ (log TV • . {too,np + tload/B + W/B), 

with high probability. 

4 Deterministic Algorithm 

The algorithm of this section is in its structure similar to the randomized algorithm of 
Section^ but deterministically, it is much harder to hnd a good sample at low costs. 

By the rank of an element I in a set we mean the number of elements in the set that 
are smaller or equal than 1. So, the smallest element has rank 1. Here we assume that all 
elements have different keys. 

The algorithm first divides the input set Af of size N into 2 • N /M chunks of size 
M /2 each. For every such subset an index structure is constructed that later allows one 
to rather accurately estimate the ranks of the elements, and to retrieve the elements in a 
range of ranks with little effort: 

Procedure first_reduction(s, M, TV) 

1. Out of the TlT/2 elements, select the elements with ranks r- T\T/(2 • s), 1 < r < s, 
as splitters. 

2. Sort the elements into the s buckets, and store them away in this somewhat sorted 
order. 

FIRST_REDUCTI0N is by far the most time consuming part of the whole algorithm: 
it dominates the I/O and the number of comparisons. Let mg = 2 • N/M denote the 
number of splitter sets after EIRST_REDUCTI0N. In A: > 0 further reduction steps, the 
number of splitter sets is going to be repeatedly halved to fh ally become nik < M/{2- 
s). The reduction proceeds as follows. It is illustrated in Figured 

Procedure eurther_reduction(s, mg, k) 
for j = 1 to fc do 

for T = 0 to [mj_i/2j do 

Load splitter set 2 • T and 2 • i + 1, each consisting of s elements. 

Merge the sets and create a new set consisting of the s elements with even 
rank. 

Store the newly created set of splitters, 
if rrij-i is odd then 

Load splitter set rrij-i — 1 and store a copy along with the new splitter sets. 
rrij = \rrij-i/2']. 

Lemma 3 The number k in EURTHER_REDUCTlON should be taken k = [log(4 • TV • 
s/TVf2)]. 

Proof: We must check that the k given is the smallest number that achieves rrik < 
M/{2 ■ s). Define the function / by f{x) = [a:/2], and let /^T) denote the function 
/ applied j times. The proof follows from the fact that = [x/2T], which can 

easily be proven by induction. □ 
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, M/2 , M/2 , M/2 , M/2 , M/2 , M/2 , M/2 , M/2 , M/2 , M/2 , 

I I I I I I I I I I 

— — — — — — — — — — mo = 10 

\/ \/ \/ \/ \/ 



Fig. 1. The set JV of size iV = 5 • M is ft st reduced to ten subsets of size s each. Then 
these are merged and reduced until three subsets remain. 



For each of the remaining splitters we now estimate the minimum and the maximum 
rank it may have in the set of size 2^ • M/2 from which it was extracted. Here, and in 
our further analysis, we assume that mo is a power of two. For other values of mg, 
some subsets are less reduced. In these the ranks can be estimated more accurately. By 
the subsets at Level j, we mean the splitter sets that were obtained after j, 0 < j < fc, 
reduction rounds. 

Lemma 4 For any element I with rank r in its subset with s elements at Level j, 0 < 
j < k, the rank R in the subset of size 2^ ■ M/2from which it was extracted, satisfies 

r-2^ ■Ml(2-s) <R< {r + j /2) ■ 2^ ■ M/{2 ■ s). 

Proof: The proof follows by induction on j. For j = 0, the lemma holds: by the selec- 
tion in Step 1 of FIRST_REDUCTI0N, an element with rank r at Level 0 has exactly rank 
r ■ M/(2 • s) in its mother set. 

So, assume that the lemma holds for some j — 1. An element with rank r at Level j 
was obtained by merging sets So and and extracting from the merged set the element 
with rank 2 • r. Let tq, ri denote the rank of Z in 5q, 5i, respectively, ro + ri = 2 • r. 
Suppose, without loss of generality, that I G Sq, then 

R>{ro- 2^”^ + n ■ 2^-^) ■ M/(2 • s) 

= r • 2^' • M/(2 • s). 

R < ((ro + (j - l)/2) • 2^-1 + (n + (j - l)/2 + 1) • 2^-1) • M/(2 • s), 

= (r-Fj/2) •2^' -M/(2-s). 

□ 

In particular, Lemma0holds for j = fc, the level at which all splitters can be loaded 
into the main-memory at the same time. For an arbitrary element I, we can now obtain 
an estimate of its rank in Af, rank(l), by determining its rank in each of the ruk splitter 
sets at Level fc. Denote the rank of I in splitter set i by ri(Z). Then 

mfc — 1 mfe — 1 

^ rfil) ■ 2^ ■ Ml (2 ■ s) < rank(l) < ^ (rfil) + k/2 -F 1) • 2^= • M/(2 • s). (2) 
2—0 2—0 

Now we can determine elements xiow and XMgh that are defii itely smaller and larger, 
respectively, than the element x with rank(x) = n that we are looking for. 



External Selection 



297 



Procedure determine_bounds 

1. Determine the largestnumber How so that (riow+(fc/2+l)-mfe)-2^-M/(2-s) < n. 
From the set of splitters at Level k, select the element with rank riow and assign it to 

Xiow- 

2. Determine the smallest number Thigh so that Thigh • 2^ • M/(2 • s) > n. From the 
set of splitters at Level k, select the element with rank Thigh and assign it to Xhigh- 



rhigh How ^ (^/2 “f 1) * m/j; + 1. (3) 

The following deterministic counterpart of LemmaQ] states that a;iow and Xhigh are 
as desired: 

Lemma 5 The selected elements xiow and Xhigh satisfy xiow ^ x < Xhigh, where x is the 
element with rank{x) = n. 

Proof: Applying 0 to Slow and Xhigh yields rank(xio-w) <n< rank( Xhigh)- □ 

The algorithm continues by tracing xiow and Xhigh in the 2 • N/M subsets of size 
M /2 at the highest level, and writing away the elements that lie between them. 

Procedure single.out 
L = N' = 0. 

for i = 0 to 2 • N/M — 1 do 

In splitter set i at Level 0, determine the largest splitter x[^^ with 
a;[ow ^ a;iow and the smallest splitter x^gh with x^gh > aihigh- 
L = L + M/(2 ■ s) ■ rank-of-x'ig^^-in-subset-with-s-elements-at-LevelO — 1. 
for each element I from a bucket between x[^^ and x^gj^ do 
if I < Xiow then L = L + 1; 
else if I < Xhigh then iV' = iV' + 1; Add I to Af. 



Lemma 6 N' < ((log(7V • s/M^) + 5) • M/(2 • s) + 1) • 4 • N/M. 

Proof: From it follows that m the set of splitters at Level k the ranks of xiow 
and Xhigh differ by at most {k/2 + 1) • ruk + 1. Bounding rank{x\ovi) from below and 
rank(xhigh) from above with Q gives 

N' < {{k + 2) • TOfc + 1) • 2^= • M/(2 • s). 

Substituting the value of k from LemmaGI gives the result. □ 

The selection algorithm, which we call DETERMINISTIC_SELECTI0N, is completed 
recursively by selecting the element with rank n — L from the set Af' with N' elements. 

Theorem 2 DETERMINISTIC_SELECTI0N solves external selection in 

T{N) = N-{1 + 0{{s + B)/M + log(iV • s/M^)/s)) ■ Uoad/B 
+ • (1 + 0(s/M + log(A^ • s/M'^) / s)) • tstore/ B 

+ N ■ C>(logs) • tcomp- 
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Proof: During FIRST_REDUCTI0N, all elements are loaded, and 7V+ 2- TV/M- s elements 
are stored away. For each subset of size M/2, 0{M ■ log s) comparisons are required, 
in total this gives 0{N ■ logs) comparisons. The costs of FURTHER.REDUCTION can 
be estimated on twice the cost of its first round, because the problem size is halved 
in each round. Thus, 4 • N/M ■ s elements must be loaded and 2 • N/M ■ s elements 
stored. The number of comparisons is bounded by 0{N/M ■ s). The time-consumption 
of DETERMlNE_BOUNDS is negligible in comparison to that of the previous procedures. 
In SlNGLE_OUT all splitters at Level 0 must be loaded. Their number is 2 • N/M ■ s. 
Thereafter, at most TV' + 2 • (M/(2 ■s) + B)-2- N/M = TV' + 2 • TV/s + 4-B- N/M 
elements must be loaded, taking into account that for every splitter subset two almost 
useless pages may have to be loaded. Then the TV' elements that are singled-out are 
stored away. The number of comparisons is small: 0{N' + N/s). Summing over all 
procedures gives 

Tload(iV) = (TV + 6 • TV/M . s + TV' + 2 • TV/s+ 4 • S • TV/M) • k^^/B + 

^store ( A^) — ( + 4 • N/ M • S + A^^) • istore I ^ ^store m, 

Tcomp(iV) = 0(iV-logs) 

^comp “F ^comp(.^ )■ 



□ 

The choice of s gives a trade-off between the computation time and the time for I/O. 
The I/O is minimized for s = (M • log(TV • s/M^))^/^. On a system where the amount 
of writable memory is small, most of the writing can be replaced by additional reading: 
if Step 2 of FIRST_REDUCTI0N is omitted, then the subset TV' can be found by scanning 
through all the data, just as was done in Step 3 of RAND0MIZED_SELECTI0N. 



5 Minimizing the Writing 

In the presented deterministic algorithm, all elements that still had to be taken into ac- 
count were stored away in bucket-sorted order. This has the advantage that all elements 
belonging to the same bucket are stored together, which is of great importance for ac- 
cessing them at low cost. However, it also implies that the total amount of I/O (loads H- 
stores) exceeds 2 • TV. We show that large savings are possible. 

At the lower levels the algorithm is the same as before. Only at the top level are there 
some differences. If a new batch of M /2 elements is loaded into the main-memory, they 
are processed as follows: 

Procedure first_reduction'(s, M, TV) 

1. Select the elements with ranks r • M/(2 • s), 1 < r < s out of the M/2 elements 
as splitters. 

2. Traverse the elements. For each element I determine to which bucket b it belongs. 
Let ii be the index of I within TV. Let /', with index i//, be the previous element 
allocated to bucket b (if I is the first element of bucket b, then T// = 0). Write i// — ii 
into bucket b. Store all buckets away. 



Lemma 7 For a given number of buckets s, the positions of the elements can be en- 
coded in Step 2 in at most 2 • TV • (1 -F log s) bits. 
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Proof: We apply a simple encoding; each number j = ii> — ii is written in binary, and 
between every two bits a 1 is inserted. After the last bit a 0 is added. In this way, j 
requires 2 • ( [log jj + 1) bits. If in total z elements jo, . ■ . ,jz-i are written whose sum 
equals M/2, then this requires at most 2 • z • (1 + log(M/(2 • z))) bits. In our case 
z = M/{2 ■ s), and there are a total of 2 • s • N/M buckets. □ 

Later, during SlNGLE_OUT, while searching the candidates together, it may happen 
that for each candidate a whole page has to be brought into the main-memory. From 
Lemma0we know that when s splitters are selected out of every M/2 elements, that 
this yields a total of N' = 0{N/ s ■ log N) candidates. Thus, we have to bring 0{N/ s ■ 
log N ■ B) elements into the main-memory. 

Theorem 3 For a given word-length w and s = w(log N ■ B), the algorithm runs in 
T{N) = {N + o{N)) ■ tload/B + 0{N ■ log s/w) ■ t„ore/B + 0{N ■ log M) ■ tcomp- 

For w = log N and s = log^ N ■ B, the loading is close to optimal, while the storing 
is reduced to 2 • • log B / log N F o{N). The encoding can easily be refined to reduce 

the factor 2 to less than 1.5. This means that even for current values of log B (between 
10 and 14) and w (32 or 64), this approach may have practical importance. 



6 Extensions 



Multiple Selections. Consider the problem of selecting several elements with specifed 
rank. Here we must distinguish the case in which all ranks are provided at the same time 
from the case in which several requests have to be served in turn. 

If c elements must be selected that are all provided at the same time, then the ran- 
domized algorithm determines xiow and a;high for each of them. After sorting these val- 
ues, the status of each scanned element can be determined with a binary search. Some 
extra care must be taken with overlapping intervals, but in principle the selection can be 
performed with N reading, c-N' writing and 0(A^-log c) comparisons. If the c elements 
are provided one-by-one, than the randomized algorithm given in Sectionals not good: 
for every element all elements must be traversed, multiplying the time consumption by 
almost a factor of c. 

Interestingly, for the deterministic algorithm it is no particular advantage to get all 
c elements specified at the start: in any case it will handle them one-by-one, using the 
same index structure again and again. Whether the refiiement of Section is advan- 
tageous depends on the value of c. Using the basic algorithm of Section 0] yields the 
following analogue of Theorem|2 

Theorem 4 DETERMINISTIC_SELECTI0N solves c external selection problems on the 
same set Af in 

T{N) = N-{l + 0{c-{s + B)/M + c ■ log(N ■ s/M^)/s)) ■ tioad/B 
+ A^ • (1 + 0{s/M + c ■ log{N ■ s/M^)/s)) ■ t,,„e/B 
F N ■ C>(logS-|- C • log(A^ • s/M^)/s) • tcomp- 
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The term 0{N ■ c • s/Mt\oai/ B) is due to the repeated loading of all splitters at 
Level 0. If all elements are specifed at the start, then for large c, it becomes proftable 
to modify the algorithm so that these splitters have to be loaded only once. Then, s can 
be chosen close to M and c up to c = o(min{M/B, M / \og{N / M)}) . 

Incremental Input. So far we have assumed that all N elements were provided at 
the start. However, one may imagine a set of elements that grows over time, while 
occasional selections must be made in between. For the deterministic algorithm, it is no 
great problem to build the index structure incrementally at little additional cost. 

If we assume that new elements are added in multiples of M/2, then all reductions 
have to be performed once. Only the copying of the splitter sets at the far right side 
gives some waste, but this can be eliminated easily. 

More Powerful Hardware. An Important issue is whether an algorithm can be run 
on a system with D > \ hard-disc units or on a system with P > 1 processors. For 
all algorithms presented this is easy: the time consumption is dominated by the first 
reduction round, and the last round in which the candidates are searched together. In 
the first round 2 • N/M chunks of M/2 elements have to be loaded. This can easily 
be sped up by using up to M/(2 • B) hard-discs. All chunks could also be processed 
in parallel. In the final round, parallelism with P up to 2 • N/M would again be no 
problem to organize. Applying the algorithm of Sectional after decoding the indices, a 
considerable number of hard-discs can be used effectively for collecting all candidates 
which stand scattered over the whole input. 



7 Conclusion 

The algorithms presented in this paper solve the external selection problem almost com- 
pletely. Only the lower-order terms might be further reduced. For the deterministic al- 
gorithm, one would also like to reduce the number of comparisons, even though on the 
extremely fast modern computers the time required to perform them is small in com- 
parison to the time for scanning through the data. 
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Abstract. In this paper we present an algorithm which shows that the 
exponential function has algebraic complexity 0(log^ n), i.e., can be eval- 
uated with relative error 0(2“") using 0(log^ n) infinite-precision addi- 
tions, subtractions, multiplications and divisions. This solves a question 
of J. M. Borwein and P. B. Borwein 0. 

The best known lower bound for the algebraic complexity of the expo- 
nential function is I7(logn). 

The best known upper and lower bounds for the bit complexity of the 
exponential function are 0(/r(n) logn) [2]| and Q(u{n)) respectively, 
where /r(n) denotes an upper bound and viji) denotes a lower bound for 
the bit complexity of n-bit integer multiplication. 

The presented algorithm has bit complexity 0(/i(n) logn). 



1 Introduction 

This paper deals with fast algorithms for the computation of the exponential 
function. We consider the number of operations sufficient to evaluate exp(a;) 
with relative error < for x in some (fixed nontrivial compact) interval [p, q\. 
For inputs x ^ [p, q\ we may apply range reduction techniques ^2] (preserving 
the complexity bounds), thus we assume w.l.o.g. [p, q] = [0,ln(2)]. 

The algorithms are analyzed with respect to algebraic complexity |^, oper- 
ational complexity P], and bit complexity. (Some authors use these terms with 
different meanings.) These concepts of complexity are made precise in Sect. El 
“Fast” algorithms are meant to be asymptotically fast with respect to such a 
complexity measure. 

The traditional way to compute the exponential function or the natural log- 
arithm is to use a partial sum of the Taylor series or a related polynomial or 
rational approximation. When we wish to compute n digits of the natural loga- 
rithm using a Taylor series, then we employ a polynomial of degree n and perform 
0{n) rational operations, while for the exponential function 0(nj logn) rational 
operations are sufficient |0|. The improvement for the exponential function re- 
flects the faster convergence of the Taylor series. Repeated use of ln(a;^) = 2 ln(a;) 
or exp(2a;) = (exp(x))^, respectively, reduces the inputs x to values close to 1 or 
0, respectively, leading to algorithms for evaluating ln(a;) or exp(a;) using Taylor 
series with 0{^/n) rational operations and bit complexity 0{pL(n)^/n) jS|. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 302-^^] 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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In the model of algebraic complexity, infinite-precision rational operations 
(-I-, — , X, /) with real numbers are accounted for with unit cost l^l- Using the 
arithmetic-geometric mean (AGM) iteration, it can be shown that the natural 
logarithm has algebraic complexity O(log^n). The exponential function may 
be evaluated using Newton’s method for inverting the logarithm. This leads to 
an algorithm with algebraic complexity O(log^n). We refer to J.M. Borwein, 
P. B . Borwein m, El, 101, and R. P. Brent m, nn for these results. 

The main result of this paper, established in Sect. O is an algorithm for 
the exponential function with algebraic complexity O(log^n). Our algorithm 
“couples” the 0(log n) many AGM iterations performed for the computation of 
exp(a;). By re-using internal results from prior iterations we speed up the internal 
square root operations. This saves a factor of log n. 

In the model of operational complexity, infinite-precision rational operations 
and infinite-precision extraction of mth roots of real numbers are accounted for 
with unit cost |^. The AGM iteration leads to an algorithm with operational 
complexity O(logn) for the natural logarithm, and the use of Newton’s method 
for inverting the natural logarithm gives an algorithm with operational complex- 
ity O(log^n) for the exponential function |2|. 

Both functions, natural logarithm and exponential function, have bit com- 
plexity 0(/j,(n) log n), where fi(n) is an upper bound for the bit complexity 
of n-bit integer multiplication (assuming that /i(n ) /n is monotonically non- 
decreasing). This bound follows from an analysis of the above algorithms (see 
again J.M. Borwein, P. B. Borwein Pj and R.P. Brent m- 

The exponential function can be evaluated with Boolean circuits of depth 
O(logn) and size (H. Alt |3|). This result has been improved by J.H. Reif 
pg and others. In PI, Y. Okabe, N. Takagi, and S. Yajima describe Boolean 
circuits with depth O(logn) and size 0(n^/log^n). It is still unknown whether 
depth O(logn) and size 0(/r(n) log n) can be achieved simultaneously. 

An overview of the results is shown in Table d (partially cited from ) . 

The idea of re-using internal results in AGM computations has been men- 
tioned before by J. M. Borwein, P. B. Borwein Pj, Y. Kanada H2!, and T. Sasaki, 
Y. Kanada m- There, the idea was only used to reduce the constant factor in 



Table 1. Overview of function and constant complexities. 



Type of function/constant 


Oalg 


Oop 


Obit 


(1) Addition 


1 


1 


n 


(2) Multiplication 


1 


1 


n log n log log n 


(3) Algebraic 


logn 


1 


Kn) 


(4) In 


log^ n 


logn 


fi{n) logn 


(5) exp 


log® n 

log^ n {-knemk) 


log® n 


/i(n) logn 


(6) 7T, ln(2) 


log® n 


logn 


/i(n) logn 
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the complexity bound, while it helps saving a factor of log n in the algorithm 
presented here. Particularly, the idea of using internal results from prior nested 
iterations seems to be new in this context. 

The algorithm has bit complexity 0(/r(n) log n) which is the best known 
asymptotic bound. The result holds for computations performed on a multitape 
Turing machine. 

In Sect. 0 we describe our models of computation. Section 0 starts with a 
short summary of Newton’s method, AGM iteration, and their use for computing 
the exponential function. Finally, we introduce the idea of coupling iterations 
and present our new algorithm with its complexity estimates. Practical results 
are given in Sect.^ 

2 Model of Computation 

For the description of the complexity measures we use definitions from |0| and 
0. The subscripts on the order symbols are for emphasis only. 

Definition 1 (Algebraic Complexity). 

A function f has algebraic complexity Oaig(s(n)) on a set A if there exists a 
sequence of rational functions such that 
(i) \Rn{x) — /(a;)| < 2“” for all x € A; 

(a) Rn can be evaluated using at most 0(s(n)) rational operations (i.e., infinite- 
precision additions, subtractions, multiplications, and divisions). 

Many authors prefer the concept of arithmetic circuits m or algebraic circuits 
0, where the size of the circuits coincides with what we call algebraic complexity. 

Definition 2 (Operational Complexity). 

A function f has operational complexity Oop(s(?^)) on a set A if there exists a 
sequence of algebraic functions An such that 

(i) \An{x) — /(a;)| < 2“" for all x € A; 

(ii) An can be evaluated using at most 0{s(ri)) rational operations or extractions 
of m-th roots. 

This measure allows us, for example, to use square root extractions in the compu- 
tation of approximations and to count them on an equal level with the operations 
-|-, — , X, /. This is often appropriate because, from a bit complexity point of 
view, root extraction is equivalent to multiplication j^. 

In practice, we have to take into account that low-precision operations are 
cheaper than high-precision operations. 

Definition 3 (Bit Complexity). 

A function f has bit complexity Obit(s(n)) on a set A if there exists a sequence 
of approximations Bn such that 

(i) \Bn{x) - f {x)\ < for all x & A; 

(ii) there is a multitape Turing machine which (given input n and x) computes 
Bn{x) with 0(s(n)) steps. 
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Throughout this paper, we assume that fj,(n)/n is monotonically non-de- 
creasing. Our assumption is certainly valid if the Schonhage-Strassen method 
m is used to multiply n-bit integers (in the usual binary representation) in 
0(n log n log log n) single-digit operations. 

Real numbers are assumed to be represented internally as binary rationals 
X = m/ 2” with u € 7Z, n S IN. More precisely, every real number is approximated 
by binary rationals up to certain errors. 



3 Algorithms 

3.1 Newton Iteration 

Given algorithms for addition and multiplication, algebraic functions can be cal- 
culated by applying Newton’s method to solving equations of the form 
f{y) — a; = 0 . For example, Newton’s method for y'^ — x = Q leads to the it- 
eration 



y^+i := y, - {yf - x)/(2y,) (1) 

which converges quadratically to \/x. Thus O(logn) iteration steps give n digits 
of accuracy and we have an algorithm with algebraic complexity 0(log n) for 
square root extraction. 

Newton’s method is self- correcting in the sense that a “small” perturbation 
in yi does not change the limit. Therefore it is possible to start with a single-digit 
estimate and roughly double the precision with each iteration step. Thus the bit 
complexity of root extraction is 0(/r(n) -|- /r(^) -I- fi{j) -I- • • • -I- = 0{fi{n)), 

as the factors can be summed up in a geometric series. 

Newton iteration for square root extraction is only a special case of the more 
general task to invert an arbitrary function. For any “reasonable” /, the iteration 

y*+i := yi - {f{yi) - x)/f{yi) 

yields an algorithm for f~^ with same bit complexity as for /. 

Using Newton’s method without further considerations for inverting multi- 
plies the algebraic and operational complexities by logn. Thus the upper bound 
0(log^ n) for the algebraic complexity of the natural logarithm yields 0(log^ n) 
as an upper bound for the algebraic complexity of the exponential function. 



3.2 The Arithmetic- Geometric Mean (AGM) 

Newton’s method cannot suffice for the fast computation of elementary tran- 
scendental functions since, if / is algebraic in (2J, then the limit f~^ is also 
algebraic. 

The most familiar iteration that converges quadratically to a transcendental 
function is the arithmetic-geometric mean iteration of GauB and Legendre for 
computing elliptic integrals. 
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Given two positive numbers a = uq and b = bo with uq > bo, we define 

Uj+i := (uj + bj)/2 , bj+i := 

for j G IN. It can be shown that (aj) and (bj) converge quadratically to a 
common limit AGM(a, 5) jHI, which can be expressed in terms of a complete 
elliptic integral of the first kind, 



\/ cos^ 9 + b'^ sin^ 9 




namely 

The relation between the natural logarithm and the elliptic integral /(•,•) 
can be stated as follows: 

Proposition 1. 

|ln(4/fc) I < 4fc2(8+ |lnfc|) for A: G (0, 1] . (2) 

(For a proof see 0 Prop. 2].) 



To get a low complexity algorithm for ln(?/) we use estimate (j2|) and set 
k := 4/(?/-2'") with shift m such that k < 2“"/^, leading to 



ln(?/) + m- ln(2) 



2-AGM(l,4/(?/-2™))’^^“^^^^ 



The AGM iteration is not self-correcting. Therefore all iteration steps are per- 
formed with precision 0{n + logn) = 0(n), where the additional O(logn) bits 
are used to avoid cancellation of significant bits. 

Up to computing tt and In (2), this allows for the derivation of algorithms with 
the complexity of entry (4) in Table 0 Algorithms for tt and ln(2) can be derived 
from similar kinds of considerations (see and others). Using Newton’s method 
for inverting the natural logarithm, we get an algorithm with the complexity of 
entry (5) in Table 0 



3.3 An Example of Coupled Iteration 

In this section we give an example of a coupled iteration which serves as moti- 
vation for this technique. Therefore we avoid a formalized description. 

In a typical situation of Newton iteration, we perform 0(log n) steps and the 
precision is doubled in each step. In Fig. [D(a) we see an iteration with a nested it- 
eration. This is, from a bit complexity point of view, the situation in iteration (HI) , 
as the division by 2yi involves Newton iteration for computing approximations 
for l/{2yi). Therefore a coupled iteration for yi « ^/x (™) and Zi « l/{2^) 
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(a) (b) 

Fig. 1. Coupled iteration schemes: (a) typical scheme of a Newton iteration (h) 
with nested iteration (r.v.-); (b) Newton iteration where the nested iteration uses 
internal results from prior iteration steps ( -). 



(r.-.v.) with re-use of internal results from prior iteration steps (Fig. E^b)) seems 
to be considerably faster. The savings in comparison to case (a) are obvious. 

We restate the main idea: Figure [D(b), applied to the Newton iteration for 
the computation of approximations to corresponds to the coupled iteration 

Ui+i := - {yl - x)-Zi , z^+l := Zi + {1- 2yiZ^)-Zi . 

By coupling these iterations we avoid the repeated computations of the begin- 
nings of approximations to 1/(2?/^) « l/(2^Jx). A careful implementation of this 
idea leads to an algorithm which is superior to the Newton iteration without 
division which is preferred by many authors. Details on the coupled Newton 
iteration for square root extractions can be found in 0 and EZI 



3.4 Coupling of Newton’s Method and AGM Iterations 

Remember that we assume that the input x of the algorithm for computing 
exp(a;) lies in the interval [p, g] = [0,ln(2)]. This assumption is only for the pur- 
pose of a simplified description and done w.l.o.g., as we may use range reduction 
techniques [ 1 3j preserving the complexity bounds given in Table E 



Simple Re-use of Internal Results. An AGM iteration can be split into two 
parts: In part I aj/bj > 2 holds, whereas in part II aj/bj < 2 is satisfied. 

In part II we may easily re-use bj (or aj) as a starting value for the com- 
putation of an approximation for = y/dj bj , because the first significant 
bit(s) of bj (or aj, respectively) and foj+i agree. This is what J.M. Borwein, 
P. B. Borwein |B|, Y. Kanada [T^, and T. Sasaki, Y. Kanada [TS| suggest. This 
approach reduces the constant factor in the bit complexity 0(/i(n) log n), but 
the algebraic complexity is still 0(log^ n). 

Applying Newton’s method to equation ln(y) — a; = 0 leads to an iteration 



Vi+i ■= Vi - (ln(y*) - x)-yi 
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; r' 



V. 



(a) (b) 

Fig. 2. Iteration schemes for exp(a;): (a) traditional way with Newton iteration 
for the inversion of ln(y) (™), AGM iteration for ln(y) (i=)), and nested iteration 
for Gib i (r------,); (b) new algorithm with common shift N and re-use of internal 

results. 



for approximations yi « exp(a;). The computation of yi+i is performed with rela- 
tive precision 0 ( 2 “^ ), i.e., ln{yi) is computed with relative precision 0 ( 2 “^ ). 

This situation is shown in Fig. Ha). 

As \y^+l - y^\ = 0 ( 2 “^*), we have |ln(j/*+i) - ln(j/i)| = 0 ( 2 “^‘). At a first 
view, we cannot make use of this estimate, because the AGM for ln(j/i_|_i) has 
to be computed with doubled precision (compared with ln(j/i)) and with a new 
shift m' ^ 2m which makes prior results unusable. 



Coupling of Two AGM Iterations. Our idea is to get rid of these two 
drawbacks. Look at two “neighbouring” AGM iterations (oj), (bj) and (cj), (dj), 
where gq = cq = 1, bo = 4/(y-2^), do = A/{y-2^) with values y, y related 
by y = y-{l + 0{2~^)). Note that both iterations use the same shift N . As y 
and y are closely related, we call these AGM iterations “neighbouring”. With 
a little effort it can be proved that the relations Cj = Gj-{1 + 0{2~^)) and 
dj = bj-{l + 0(2~^)) hold for j = 0, 1,2, . . . . Thus, if bj+i (or a sufficiently 
precise approximation to it) has been computed, it can serve as a starting value 
for the computation of an approximation to dj+i = ^Jcjdj. In two or three 
Newton steps we get a -digit approximation for dj+i. 

Gonsider the computation of ln(y) to relative precision 0(2“^) and ln(y) to 
relative precision 0(2“^^), where | ln(y) — ln(y)| = 0(2“^/^). If the AGM itera- 
tion for ln(?/) has been computed with relative precision k' = k + 0(log N) (the 
additional 0(log N) digits of precision are used again for avoiding cancellation) 
and if we have stored the approximations for the values 6 i, 62 , ^ 3 , • ■ we can 
speed up the computation of ln(y), because the approximations to the square 
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roots di = ^/codo, d 2 = Vci^i, ■ • ■ with relative error 0(2 ) can be computed 

in constant time. This precision is sufficient to get a 2A:-digit approximation for 
ln(y). 



Coupling of Newton Iteration and AGM Iterations. In the algorithm 
for computing an approximation to exp(a;), we find again “neighbouring” AGM 
iterations. More precisely, the AGM iterations performed for ln(?/i) and ln(yi+i) 
are neighbouring for f = 0,l,2,... . 

To make use of the technique described above we synchronize the O(logn) 
many AGM iterations with a (sufficiently large) common shift N = 0(n). 
Once we have computed the AGM iteration for In(yo), the further procedure 
is straightforward. 

This situation is shown in Fig. m , where the additional iterations are caused 
by the common shift N = 0{n). Ideally, we are searching for an iteration, where 
we use a dynamically changed shift. We have not yet found such an algorithm 
which shares the same simultaneous upper bounds for the algebraic and bit 
complexity as our presented algorithm. 

The initial computation of an approximation for ln(?/o) (with shift 0(n) for 
the AGM iteration) can be performed in 0(log^ n) rational operations, and as 
we can now evaluate the exponential function with O(logn) Newton iteration 
steps, each involving O(logn) AGM iteration steps which can be computed in 
constant time, we have proved the following theorem. 

Theorem 1. The exponential function has algebraic complexity O(log^n). 

The initial computation of an approximation for ln(?/o) (with shift 0(n) for 
the AGM iteration) can be performed with bit complexity 0(/r(n) log n). The 
relative precision is roughly doubled in each iteration step of the Newton iteration 
for the exponential function. Thus we can sum up the terms in a geometric series 

as 0(/i(n)logn) + 0((^(n)+/r(f) + /r(f)H h/i(log n)) log n) = 0(/r(n) log n). 

We state this result as Theorem El 

Theorem 2. The described algorithm has bit complexity 0{p,{n) log n). 

Remark 1. For the purpose of a simplicity, we did not describe how to perform 
the computation of the initial AGM iteration with O(lognloglogn) rational 
operations. We also neglected the possibility of coupling the Newton iterations 
for square roots in the sense of section [3.3^ i.e., using prior approximations for 
the computation of approximations to the reciprocals l/(2yuij6j). 

4 Practical Results 

Our algorithms were implemented in the TP assembly language described in PSI. 
To achieve fair results we follow PHI in the concept of typical time: We compare 
the running times for inputs of (decimal) length n with relative error 10“" in 
the outputs. The inputs are in [i, 1] and were taken from random generators. 
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Routine EXP is an implementation of the traditional Newton iteration for 
the inversion of ln(y). Routine Coupled-EXP also computes this iteration, but 
in the new “coupled” way, performing three Newton steps in the square root 
computations. In practice, both routines have nearly the same running time. Our 
new algorithm can be improved additionally by using the ideas from Remark ^ 
This and revision of the square root routine, such that two Newton steps (instead 
of three) are sufficient, would save about 10-15% running time. 

To make the measurements more informative and “system-independent” , we 
made further comparisons with the packages Mathematica 2.2 |2I] and MPFUN 
0. The routine MPEXPX (MPEUN) uses a similar algorithm as routine EXP. 
The algorithm underlying the Exp-command of Mathematica is unknown to the 
author, but the factor of « 6 = 3 in the running times when doubling 

the problem size makes the use of Taylor series with Karatsuba multiplication 
probable. 

Table El summarizes our practical results. 



Table 2. Running time comparison (CPU time in sec) Mathematica - MPFUN 
- TP. Timings do not include time for pre-computations of the constants tt and 
ln(2). All timings are for a SPARC 20/512 with 50MHz. 



n digits Exp MPEXPX EXP CoupledJSXP 



1000 
2 000 
5 000 
10 000 
20 000 
50 000 
100 000 
200 000 
500 000 
1 000 000 



1.16 

6.6 

65.2 

367. 

2147. 

20730. 

121552. 



1.68 

4.9 

16.1 

45. 

119. 

297. 

747. 

1541. 

7742. 

17239. 



0.17 

0.5 

2.9 

11 . 

31. 

117. 

294. 

744. 

1972. 

4594. 



0.18 

0.5 

2.9 

11 . 

32. 

115. 

285. 

735. 

1946. 

4412. 



5 Conclusion and Future Work 

A future version of our software package will include the improvements men- 
tioned above, thus saving about 10-15% running time in the some thousand 
digit range and above. 

The idea presented in this paper seems to be applicable for many iteration- 
like computations. It may lead to further progress if we find a general theory 
on the repeated use of intermediate results in the computation of (elementary) 
functions. 
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Abstract. We investigate the notion that a system, or process, is an 
acceptable implementation of another base or target process, in the case 
that they have different interfaces. Base processes can be thought of as 
specifications, or ideal processes operating in an error-free environment, 
while implementations model their actual realisation, possibly employ- 
ing a variety of fault-tolerant techniques. Using the CSP model, we re- 
late implementations to targets in terms of their observable behaviours, 
through interface abstraction. We obtain two basic results: realisability 
and compositionality. The former ensures an implementation up to in- 
terface abstraction can be put to good use, in the sense that plugging it 
into an appropriate environment yields a conventional implementation. 
Compositionality requires that a target made up of subcomponents can 
be implemented by assembling their respective implementations. 
Keywords: Theory of parallel and distributed computation, behaviour 
abstraction, communicating sequential processes. 



1 Introduction 

This work aims at formalising the notion that a system is an acceptable imple- 
mentation of another base or target system, in the case that the two systems have 
different interfaces. The goal is to provide satisfactory answers to such questions 
as: In what sense does a serial link implement an abstract byte channel? Can 
handshake performed by a pair of asynchronous channels be seen as an imple- 
mentation of a single synchronous channel? What does it mean that a replicated 
clock implements a single, precise clock? 

To our knowledge the proposed problem has been paid scant attention so 
far. Yet, it has a clear practical interest extending e.g. to some fundamental 
fault tolerance techniques. Indeed, we have applied restricted versions of the 
treatment proposed Pd to the analysis of NMR (N-Modular Redundancy), 
whereby N possibly faulty copies of a base system mimic the behaviour of a 
non- faulty base. Another application we investigated p] is the formalisation of 
CA actions, a structuring concept for distributed fault-tolerant systems jSj. 

Whatever the answer to the issue raised, it is clear that the implementation 
and the target should be compared extensionally, i.e. in terms of their observable 
behaviours, as determined by the communication taking place at their respective 
interfaces. We add that, when interfaces differ, communication performed by 
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the implementation through its actual interface should first be interpreted as 
communication through the target’s interface: the interpretation outcome is then 
suitable for comparison with the target’s intended behaviour. Since it is natural 
to consider the target’s communication as more abstract and understandable, 
we regard the noted interpretation as a form of abstraction or decoding. 

Conventional formal methods can cope with interface difference between im- 
plementation and target only to a limited extent, by the two interpretation pat- 
terns represented by (partial) interface renaming and hiding. No special treat- 
ment is however needed in this case, because our notion of implementation ori- 
ented to interface interpretation is easily reduced to conventional ones. For in 
most formalisms hiding or renaming can be applied not only to actions (atomic 
instances of communication), but also as an operator, say /, to a system as a 
whole. Thus if P is a formal object representing the implementation system, 
f{P) is also an object, whose communication is that of P interpreted through 
/. That P implements a target system rendered as K can be expressed as: 

f{p)nK (1) 

where relation TZ depends on the formalism. It could be, e.g., an observation 
preorder holding between process expressions f{P) and K, or even satisfaction 
of assertion K by (the meaning of) f{P) in a logic based framework. 

However, the (rather obvious) reduction expressed by (P) is not applicable 
whenever interpretation cannot be lifted to the system level, as an operator / 
on systems. Indeed, it is often the case that interpretation can only be expressed 
at the level of communication traces (sequences of actions), as a mapping of 
traces at the implementation interface onto traces at the target interface; good 
examples are provided by NMR majority voting, or by a serial link viewed as 
an abstract byte channel. Furthermore, if correctness analysis is to be extended 
to non-deterministic aspects of communication, these are also to be interpreted 
across interface pairs. The following treatment will show precisely how useful 
interpretations need to be characterised by mathematical structures called ex- 
traction patterns. Anyway, it should be already clear that an extraction pattern, 
say again / with some abuse, is too complicated a device to be made into an 
operator on systems. The approach summarised by (P is therefore not viable. 
Alternatively, we have to try and relate directly the implementation P and the 
target K, building straight into their relation the interpretation expressed by 
the extraction pattern /. Formally, we try to establish: 

PTZfK (2) 

What kind of device should the implementation relation (as we call it) TZf be? 

For the problem at hand to be worth studying in general terms, / should 
play the role of an argument replacing a ‘formal parameter’ in a generic relation 
‘scheme’ TZ_ (this is the case for those proposed later). But we saw no obvious 
or useful way of endowing standard relations, like e.g. observation preorders, 
with an extraction pattern parameter, so it turned out that we were treading 
on completely new ground here. Moreover, we did not it find easy to identify a 
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single implementation relation, proving immediately acceptable to intuition. In 
fact, different relation schemes are likely to turn out to be useful for different 
applications, system classes or interconnection topologies. 

On the other hand, candidate implementation relations can be tied firmly to 
intuition by placing two light but very natural requirements on them. The first, 
accessibility or realisability, ensures that the abstraction built into the implemen- 
tation relation may be put to good use; in practic^ this means that plugging an 
implementation into an appropriate environmenljil should yield a conventional 
implementation of the target. Distributivity or compositionality, the other con- 
straint on the implementation relation, requires it to distribute over system com- 
position; thus, a target composed of two connected systems may be implemented 
by connecting two of their respective implementations. 

Looking at (|2) reveals we must still make some choices to complete the formal 
framework of our investigation on the nature of the implementation relation 
TZf. The implementation P must of course be rendered as a process expression 
(of which language is a secondary issue). The target K could be formalised as 
either another process expression or as a logical assertion. However, given the 
conceptual difficulties of relating P to K taking interpretation of communication 
into account, we see no reason to widen the semantic gap between P and K . 

Finally, we need a semantic model for process expressions P and K. As noted 
above, the extraction pattern denoted by / in (|2D normally interprets communi- 
cation in terms of simple observable communication records, like traces, refusals 
(sets of actions refused) and failures (pairs combining a trace with a possible 
refusal after that trace). Thus, incorporating / smoothly into the implementa- 
tion relation scheme requires that the meaning of a process should be firmly 
and directly based on its observable communication records. This is certainly 
true for the explicit FD (failure-divergence) model of CSP m, much less so for 
models that equate meaning to an observation equivalence class (see e.g. |^), 
or for algebraic models that even depend on the set of process operators and 
axioms adopted Besides, the necessary choice of a specific equivalence or 
operator and axiom set would be unattractive and potentially misleading in the 
context of this investigation. In contrast, our treatment will not even require any 
combining device except process composition. 

The FD model adopted is expressive, well-understood and provides a wealth 
of important theoretical results. A treatment could also be given within a pure 
trace model 0: but this would only allow partial (functional) correctness to be 
studied, whereas failures afford the ability to describe internal nondeterminism 
(hence deadlock properties). Divergences are traces after which a process might 
become uncontrollable, or diverge. While their presence can be regarded as a 
nuisance for applications like protocol verification |^, in the context at hand it 
provides a means of forbidding a network of implementations to diverge if the 
network of the respective targets does not. Altogether, the FD model provides 
a framework to reason about quite thorough a form of correctness. 



^ In we build this environment with disturbers, feeding the implementation faulty 
but sufficiently redundant input, and extractors interpreting its output. 
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Finally, CSP is also well-suited to describing the intended family of target 
systems. Systems considered possess an interface that may be conceptually di- 
vided into ainput and output channels. Moreover, input channels should never 
refuse any input. These light restrictions do not imply a simpler asynchronous 
communication model could be employed. For we do not want to impose the 
same restrictions on implementations, both for the sake of generality and with 
a view to modelling controlled faults allowed within fault tolerant systems. 

Proofs of all the theorems stated later and related results are in [^. The 
reader ought to be familiar with the basic concepts of CSP m 

2 Preliminaries 

In the FD model of CSP a process P is a triple {aP, (j)P, SP) where aP 
(alphabet) is a non-empty finite set of actions, (pP (failures) is a subset of aP* x 
2 “^, and SP (divergences) is a subset of aP* . The set of traces of P is tP — {t \ 
(t,R) G pP}- We use structured actions of the form b!v, where r; is a message 
and b a communication channel. For every channel b, the alphabet ab of b is 
the set of all valid actions blv. For a set of channels B, aB = ab. We will 
associate with P a set of channels, chan P, partitioned into input channels, in P, 
and output channels, outP, and stipulate aP = a chan P. For a process P, let 
ipP = {t I (t, aoutP) G 4>P} (after t G ipP, P refuses any further output for the 
input within t). The following notation is similar to that of (below t,u are 
traces; b,b',b” channels; T and T' sets of traces; and B a set of channels): 

— t = (ai, . . . , a„) is the trace with f-th element at, and length |f| = n; 

— t ou is the trace obtained by appending u to t; 

— < is the prefix relation on traces; 

— t\b' /b] is a trace obtained from t replacing each action b\v by b'\v] 

— t\B is obtained by deleting from t all the actions that do not occur on 
channels in P; e.g., (6"!3, 6!1, 6"!2, 5!3, 6'!3) |"{6, b'} = (6!1, 6!3, P!3); 

— / : T — !■ T' is: monotonic ii t < u, t,u G T imply f(f) < f(u)~ strict if 
/(()) = 0; a homomorphism ii t,u,t o u GT imply f(t o u) = f(t) o f(u). 

We use these CSP operators: parallel composition P||Q; hiding P\B of com- 
munication at the channel set P; deterministic choice P D Q; non-deterministic 
choice P n Q; and prefixing a ^ P. Processes Pi,...,P„ form a network if 
in Pi n in Pj = % — out Pi n out Pj for z j; if so, P = Pi 0 • • • 0 P„ denotes 
the parallel composition of processes P\ • • ■ Pn with interprocess communication 
hidden, i.e. P = (Pi|| • • • ||P„)\P, where P is the set of channels shared by at 
least two different processes. We stipulate inP = Ufci ~ 
outP = out Pi — Ur=i Throughout the paper, we will often refer to 
the processes P, K and L shown in Figured 

Let U and V be two non-empty prefix-closed sets of traces over respectively 
in P and outP. Then P is input-guarded w.r.t. U and V if, for every set of traces 
T C rP satisfying T[inP C U\ (IGl) TfoutP C V; and (IG2) if T is infinite 
then so is T\\nP. We denote this by P G \G(U,V). 
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Fig. 1. Three generic processes (where and n > 1). 



We consider as base processes the class GIO of general input-output processes. 
P G GIO iff it is input-guarded, i.e. P G IG(o!in P*, aout P*), and never refuses 
any input, i.e. RDainP = 0, for all (t, P) G (j)P. An unbounded deterministic 
buffer with input channel p and output channel q, denoted by buffetpq, belongs 
to GIO. 



3 Extraction Patterns 

Consider two processes, Sgl and Buf, as shown in Figure EI^). The former gen- 
erates a single binary pulse, belonging to B = {0, 1}, on its output channel 
p, then terminates; the latter is a buffer process forwarding signals received 
on its input channel p. Sgl = Di,gb ^ stop is a GIO process, and so is 
Buf = bufferpq. Suppose now Sgl and Buf are implemented using processes Sglp 
and Bufo respectively, connected by two channels, r and s, as shown in Fig- 
ure Ei:b). Sglo = D {r\v — > stop II s!u ^ stop), i.e. the signal is now du- 
plicated by Sglo and the two copies sent along r and s. However, Bufo (CSP 
definition omitted) only accepts one of the copies and passes it on, ignoring the 
other one. The scheme clearly works, since Sgl 0 Buf = Sglg 0 Bufo. Suppose 
now communication is imperfect and two types of faulty behaviour can occur: 
Sglj = Sglo n stop and Sgl 2 = Sglo n D«6B r\v — > stop. Le., Sgl^ 
can break down completely, refusing to output any signals, while Sgl 2 can fail 
in a way that, although channel s is blocked, r can still transmit the signal. 
(This could model the following realistic situation: to improve performance, the 
‘slow’ channel p is replaced by two channels, a high-speed unreliable channel s 
and a slow but reliable backup channel r.) Since Sgl ® Buf = Sgl 2 ® Bufo but 
Sgl ® Buf yf Sgl]^ 0 Bufo it follows that Sgl 2 is much ‘better’ an implementation 
than Sgl^ of the Sgl process. We will now analyse the differences between Sgl^ 
and Sgl 2 while introducing informally some basic concepts needed later. 
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Fig. 2. Two base processes and their implementations. 
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We start by observing that the output of Sgl 2 can be thought of as adhering 
to the following two rules: (Rl) the communication over r and that over s are 
consistent; and (R2) communication over r is reliable, but there is no such 
guarantee for s. The output produced by Sgl^ satisfies Rl but fails to satisfy R2. 
To express this formally, Rl and R2 must be set in some precise notation. 

To capture the behavioural relationship between Sgl and Sgl 2 we will employ 
an (extraction) mapping extr which, given a trace over r and s, interprets it 
as a trace over p. E.g., () (), (r!0) (p!0), (s!0) (p!0), (s!l,r!l) i— > 

(p!l) and (r!l,s!l) (p!l). Note the extraction mapping need only be defined 

for traces satisfying Rl. But the extraction mapping alone cannot suffice to 
identify the ‘correct’ implementation of Sgl in the presence of faults, since rSgl = 
extr(rSgl]^) = extr(rSgl 2 ). What one also needs is an ability to relate the refusals 
of Sglj with the refusals of the base process Sgl. This is much harder. For, simply 
applying extr also to refusals of Sgl 2 will not yield failures of Sgl. E.g. (( ), {s!0}) 
is in <()Sgl 2 , but the pair (extr(( )), {extr(slO)}) = ((),{p!0}) is not in ^Sgl: i.e., 
crude extraction of refusals is not going to work. A more sophisticated device is 
needed, in the form of another mapping, ref, constraining the possible refusals a 
process can exhibit after a given trace. This will help preventing, e.g., a sender 
process from blocking if its transmission is still incomplete. In the example at 
hand, this roughly amounts to requiring that communication on channel r should 
not be blocked before the sender, Sgl 2 , has sent a signal over it. For example, 
we will stipulate that ref((s!0)) must not comprise a refusal containing both 
r!0 and r!l. We will denote by dom the traces conveying information that can 
be regarded as complete. E.g. (s!0) and (s!l) should not belong to dom for, 
according to R2, we can expect them to be completed by some action over the 
reliable channel r. 

The last notion we need in order to relate an implemantation and a target 
is a partial inverse inv of the extraction mapping. This will be used to ensure 
all the traces of a base process can be extracted from the traces of any imple- 
mentation. Formally, an extraction pattern is defined for two non-empty sets 
of channels, B and B', respectively called sources and targets. It is a tuple 
ep = (dom, extr, ref, inv) such that: 

EPl dom is a set of traces over sources; its prefix-closure is denoted Dom. 
EP2 extr is a strict monotonic mapping for traces in Dom; for every t, extr(t) 
is a trace over targets. 

EPS ref is a mapping for traces in Dom; for every t, ref (f) is a non-empty family 
of proper subsets of aB such that Y C X G ref(f) implies Y G ref(t). 
EP4 inv is a trace homomorphism from traces over targets to traces in Dom; 
for every trace w over targets, extr(inv(r<;)) = w. 

The extraction mapping is monotonic, as receiving more information should not 
decrease the current knowledge about information exchange. aB ^ ref(f) means 
that for the unfinished communication t we do not allow the sender to refuse all 
possible transmission. Since inv is a trace homomorphism, it suffices to define it 
for single actions over targets only. Note that ep is not decorated with source 
and target channels explicitly. We assume these can always be retrieved from the 
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domains of extr and inv. By default, different extraction patterns have disjoint 
sources and disjoint targets, and the components of extraction patterns can be 
annotated to avoid ambiguity. A basic extraction pattern (see 0) is one with a 
singleton target channel. 

Let ep]^ and ep 2 be two extraction patterns with sources (targets) respectively 
Bi and B 2 {B[ and i?^)- Then ep = epj^ ©ep2 is defined as the extraction pattern 
with sources B = Bi U B 2 and targets B' = B[U B '2 such that: 



dom 

ref(t) 

extr(t o (a)) 
inv(a) 



{t G (dorrii U dom2)* | t\BxUB[ G dorrii A t\B2UB2 G dom2} 

{R G aB I R n aBi G refi(t[i?i) V i? n ai?2 G ref2{t\B2)} 

( extr(t) o ui if a G aBi A extri(w|’i?i o (a)) = extri(?u|’i 3 i) o ui 
\ extr(t) 0U2 if a G aB2 A extr2(w|"i?2 ° (a)) = extr 2 (?u|"i 32 ) ° U2 

invi(a) if a G ai?', for i = 1,2 



The © operation is both associative and commutative and provides a way of 
building complex extraction patterns from simpler ones. An identity extraction 
pattern for a channel b, idf,, is one for which B = B' = { 6 }, dom == Dom = ab* , 
extr(t) = inv(t) = t and ref(t) = {i? | ab % i?}. For a set of channels B" = 
{ 61 , . . . , bk\, ids// = idbj © • • • © idb^ . 

For our example implementations of Sgl and Buf, the extraction pattern 
needed fs is defined by: dom = {t G (arUas)* | (t|"s)[p/s] < {t\r)[p/r]\, ref(t) = 
{R\ar % R}, extr(f) = max{(t|’s)[p/s], (t|"r)[p/r]} and inv(p!u) = {rlv). 



4 Extraction in Acyclic Networks 

Suppose that we implemented the base GIO process P using another process 
Q. The correctness of the implementation will be expressed in terms of two 
extraction patterns, ep and ep'. The former (with sources in Q and targets in P) 
will be used to relate the communication on the input channels of P and Q; ep' 
(with sources outQ and targets outP) will serve a similar purpose for the output 
channels. There are four main properties Q has to satisfy: Firstly, if a trace t of 
Q projected on its input channels can be interpreted by ep, then it should be 
possible to interpret its projection on the output channels by ep' (see WIl below 
and IGl). Secondly, when connected to another process and supplied with valid 
input (i.e. belonging to Dom), Q should not introduce divergence (this rules out 
infinite uninterrupted communication on output channels — see WIl and IG2), 
and should not refuse ‘proper’ input (this rules out refusals which might lead 
to a deadlock with another process that provides input to Q and whose refusals 
are constrained by ep — see WI2). Thirdly, if Q has received a complete input 
from process S and is to output to process W, then Q should not deadlock 
with W until it has sent W all the output produced (WI3). Finally, we ensure 
the functional behaviour of P (in terms of traces) can be realised by Q (WI4). 
Formally, Q is a weak implementation of P, denoted Q G WI(P, ep, ep'), if: 

WIl Q G IG(Dom,Dom'). 

WI2 If {t, R) G 4>Q is such that t\\nQ G Dom then ain Q — R ^ ref(t|’in Q). 
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WI3 If (t,R) G 4>Q is such that t\\nQ G dom and aoutQ H R ^ ref'(t [out Q) 
then f[outQ G dom' and extrgp©ep' (t) G i^P- 
WI4 inVep®ep'(TP) C tQ. 

One can see that P G WI(P, idi„p, idoutp) C GIO. For our example processes, we 
have Sgl 2 G WI(Sgl,fs) (we omitted ep since there are no input channels) and 
Sgli ^ WI(Sgl, fs). Indeed, Sgl;^ ^ WI(Sgl, ep') for any extraction pattern ep' since 
(( ), ar U as) G Sgl^ and arUas^ ref'(( )) (see EPS). 

To support modular treatment of networks of implementations we will now 
establish two properties, namely compositionality and realisability. Referring to 
Figure ^ the former is formulated thus. 

Theorem 1. Let H = tt) and c,d,e,f,g be extraction patterns whose targets 
are respectively the channel sets C, D, E, F, G. If M G \N\{K, c, d 0 e) and N G 
Wl (L, d 0 /, g) then M N gV\I\{K L,c® f,e(B g)- □ 

Realisability is captured below where for GIO processes P and Q with the same 
input and output channels, Q ^ P if tQ — tP and tjjQ C ifP. 

Theorem 2. If Q G WI(P, idi„p, idoutp) then Q P- □ 

Why can this result be regarded as expressing an adequate notion of realisability? 
First, note that given Q' G WI(P, ep, ep'), where ep and ep' are general extraction 
patterns, compositionality in many situations allows Q G WI(P, id;np, idoutp) 
to be constructed, e.g., using the extractors and disturbers discussed in mnj. 
As for the ^ relation, think of an environment for processes Q and P which 
is represented by an arbitrary GIO process Rev which is to receive the results 
produced by Q and P, as shown in Figure 0 It turns out that Q < P implies: 
t{Q 0 Rev) = t{P ® Rev) and (j){Q ® Rev) C (j)[P ® Rev). The former property 
means that as far as functionality is concerned, QoRev and P0Rev are equivalent 
processes. The latter property means that Q ® Rev is at least as deterministic a 
process as P 0 Rev in the sense of 0 . This makes Q 0 Rev an as good as P 0 Rev 
(and possibly much better) process to be used in the actual implementation. Note 
that the underlying philosophy of 0 is to allow as non-deterministic processes 
as possible for specification, but as deterministic as possible ones for actual 
implementation; our approach is thus in line with that advocated there. Hence 
Theorem 121 captures an adequate notion of realisability. 

The above realisability result can be strengthened if we assume that to {a) ^ 
rP, for all t G ipP and a G aoutP (this means P can refuse to generate any 
output only if its functional specification in terms of traces rules this out). We 
denote P G GIO' in such a case. GIO' is a subset of GIO, but is not closed under 
composition 0. It is easily seen that if P G GIO' then Q 0 Rev = P 0 Rev. 

The compositionality result of Theorem Q] is easily extended to process net- 
works. Indeed, let Pi, . . . ,Pfe be a network of base GIO processes, and let Qi 
be a weak implementation of Pi, for all i. Using Theorem 0 and induction it 
is straightforward to prove that the network Qi, ■ ■ ■ ,Qk represents a weak im- 
plementation of the network Pi, . . . ,P^. However, the possible topology of the 
latter is restricted by the H = % hypothesis of Theorem 0 This implies the base 
network must be acyclic in the obvious sense. 
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Fig. 3. Realisability of GIO processes 

5 Extraction in Cyclic Networks 

As demonstrated in , weak implementation is not sufficient to deal with cyclic 
networks of processes. We now take the pair of base processes K and L, as shown 
in Figured without assuming H — The first problem we face is that K®L will 
not, in general, be a GIO process even though both K and L are. Our treatment 
is therefore restricted to well-behaved cases which do not lead to divergence. We 
say that the processes K and L are compatible if for every infinite set of traces 
T C t{K\\L), the set T[in {K ® L) is also infinite. It follows that, if K and L are 
compatible GIO processes, then K ® L e GIO. 

With the same assumptions as in the definition of Wl , Q is a strong imple- 
mentation of P, denoted Q G SI(P, ep, ep'), if the following hold: 

511 If T C tQ and T[in Q C Dorn then: T[outQ ^ Dorn'; if T is infinite then 
so is extr(T|'in Q); and extrep©ep' (T) C tP. 

512 If (t, R) G (jiQ is such that t\\nQ G Dorn then ain Q — R ^ ref(t[in Q). 

SIS If (t,R) G (j)Q is such that t\\nQ G Dorn and aoutQ H i? ^ ref'(t [out Q) 

then: t[outQ G dom'; and t G domep©ep' extrgp©ep' (^) ^ 4’P- 
SI4 inVep©ep'(''"^’) ^ tQ. 

The main difference between this and the previous definition is that IG2 in the 
definition of input-guardedness has been replaced by a stronger condition which 
can be interpreted as input-guardedness relative to extraction. Moreover, we 
have strengthened WI3 and added a third condition in SIl to explicitly state 
that extracted valid traces of Q are traces of P. Being strong implementation 
implies being also weak implementation (but not vice versa); hence realisability 
follows from Theorem El Compositionality is as follows. 

Theorem 3. Let K and L he two compatible GIO processes, and let c, d, e, f, 
g and h be extraction patterns whose targets are respectively the channel sets C , 
D, E, F, G and H . If M G SI(iG, c(B h,d(B e) and N G SI(L, d(B f,g (B h) then 
M N gS\{K L,c® f,e® g). □ 

Thus strong implementation is suitable for dealing with cyclic process networks, 
provided the base network can be decomposed down to single network compo- 
nents in such a way that at each stage compatibility is satisfied. 

6 Concluding Remarks 

We have presented a framework for relating a target process with its imple- 
mentation, when each communicates with the environment through a possibly 
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difFerent interface, and/or pattern of information exchange. The framework sup- 
ports a natural and expressive way of specifying fault assumptions as demon- 
strated in HCTtI . Moreover, it should be applicable to a variety of distributed 
architectures, provided the base processes employed as high level specifications 
adhere to the GIO properties (cf. Sectional). It should be noted that although 
the GIO process class is related to some of the existing models of communication 
with non-blocking input (such as lO-automata), this is not necessarily the case 
for processes implementing them. Therefore CSP, being more general a model, 
seems to provide a more appropriate framework. 

This paper extends previous results in two ways, by introducing non-base 
extraction patterns and a more powerful notion of strong implementation for 
cyclic networks of processes. Future work will concentrate on extending the the- 
ory to cover non-GIO base processes, and on developing algorithms for verifying 
the implementation relations. 
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Abstract. Lossy VASS (vector addition systems with states) are defined 
as a subclass of VASS in analogy to lossy FIFO-channel systems. They 
can be used to model concurrent systems with unreliable communication. 
We analyze the decidability of model checking problems for lossy systems 
and several branching-time and linear-time temporal logics. We present 
an almost complete picture of the decidability of model checking for 
normal VASS, lossy VASS and lossy VASS with test for zero. 



1 Introduction 

VASS’s (vector addition systems with states) can model communicating sys- 
tems through unbounded unordered buffers, and hence they can be seen as 
abstractions of fifo-channels systems, when the ordering between messages in 
the channels is not relevant but only their number. Communicating systems are 
often analyzed under the assumption that they communicate through unreliable 
channels. Hence, we consider lossy models of communicating systems, i.e. mod- 
els where messages can be lost. Recent works are about lossy unbounded fifo- 
channels systems [XTMirOiiHllCFlQfij . The reachability problem is decidable for 
these models, which implies the decidability of the verification problem for safety 
properties. However, liveness properties cannot be checked for lossy fifo-channel 
systems, unless for very special ones like single eventualities. In particular, it is 
impossible to model check lossy channel systems under fairness conditions. Here 
we study verification problems for VASS and VASS with inhibitor arcs (counter 
machines) under the assumption of lossiness, i.e. the contents of a place/counter 
can spontaneously get lower at any time. 

Using the approach introduced in HMHiEniins, it can be shown very easily 
that the set pre*{S) of predecessors of any set of configurations S is effectively 
constructible for lossy VASS even with inhibitor arcs, and that this set can be 
represented by simple linear constraints (SC for short), where integer variables 
can be compared only with constants. Moreover, for lossy VASS, the set post*{S) 
of successors is SC definable and effectively constructible, but interestingly, for 
lossy VASS with inhibitor arcs these sets are not constructible although they are 
SC definable. 

Local model checking, or simply model checking, consists in deciding whether a 
given configuration of a system satisfies a given formula of a temporal logic, and 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 323-ESI 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



324 Ahmed Bouajjani and Richard Mayr 



global model checking consists in constructing the set of all configurations that 
satisfy a given formula. We address these problems for a variety of linear-time 
and branching-time properties. We express these properties in a temporal logic, 
called AL (Automata Logic), which is based on automata on finite and infinite 
sequences to specify path properties (in the spirit of ETL), and the use of path 
quantifiers to express branching-time properties (like in ECTL* lirho 8 f)l l. The 
basic state predicates in this logic are SC constraints. 

Our main positive result is that for lossy VASS, the global model checking is 
decidable for the logic 3AL with only upward closed constraints, and dually for 
VAL with downward closed constraints (VAL and 3AL are the universal and ex- 
istential positive fragments of AL. They subsume respectively the corresponding 
well-known fragments VCTL* and 3CTL* fCTj94| of the logic CTL*). When only 
infinite paths are considered our decidability result also holds for normal VASS. 
A corollary is that linear-time properties on finite and infinite paths (on infinite 
paths only) are decidable for lossy VASS (normal VASS). We can even construct 
the set of all the configurations satisfying these properties. This generalizes the 
result in where only model checking is considered. Notice also that VAL 

is strictly more expressive than all linear-time temporal logics. 

These decidability results break down if we relax any of the restrictions: model 
checking becomes undecidable if we consider VAL or 3AL formulae with both 
downward and upward closed constraints, or if we consider lossy VASS with 
inhibitor arcs. Also, even if we use only propositional constraints in the logic 
(i.e., only constraints on control locations) the use of negation must be restricted: 
model checking is undecidable for CTL and lossy VASS. However, it is decidable 
for the fragments EE and EG of CTL even for lossy VASS with inhibitor arcs, 
but surprisingly, global model checking is undecidable for EG and lossy VASS 
(while it is decidable for EE and lossy VASS with inhibitor arcs) . As a side effect 
we obtain that normal VASS (Petri nets) and lossy VASS with inhibitor arcs 
(lossy counter machines) are incomparable. 

The missing proofs can be found in the full version of the paper. 



2 Vector Addition Systems with States 

Definition 1. A n-dim VASS S is a tuple (A, X, Q, 6) where S is a set of action 
labels, X is a set of variables such that \X\ = n, Q is a finite set of control states, 
6 is a finite set of transitions of the form (qi, a, A, (72) where a G S , A G . 

A configuration of 5 is a pair {q, u) where q G Q and u G IN'^ . Let C{S) be the 
set of configurations of S. Given a configuration s = {q,u), we let State(s) = q 
and V al{s) = u. 

We define a transition relation — > on configurations as follows: {qi,ui) — ^ 
( 92 ,^ 2 ) iff 3r = {qx,a, A,q 2 ) G 5 , U 2 = Ui + A. Let post^{{qx,Ux)) (resp. 
pre^{{q 2 ,U 2 ))) denote the configuration (52, M2) (resp. (qi,ui)), i.e., the imme- 
diate successor (resp. predecessor) of (qi,ui) (resp. (92, M2)) by the transition 
T. Then, we let post (resp. pre) denote the union of the post^’s (resp. prey’s) 
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for all the transitions r G <5. In other words, post{{q,u)) = {{q',u') : 3a G 
S. {q,u) {q' ,u')}, pre{{q,u)) = {{q' ,u') : 3a & S . {q' ,u') {q,u)}. 

Let post* and pre* be the reflexive-transitive closures of post and pre. 

Given a configuration s, a run of the system S starting from s is a finite or infinite 
sequence soaosioi ■ ■ ■ Sn such that s = sq and, for every i > 0, Si — G Si+i. We 
denote by Runf{s,S) (resp. Runcj{s,S)) the set of finite (resp. infinite) runs of 
S starting from s. 

A lossy VASS is defined as a VASS with a weak transition relation on 
configurations. We define the relation as follows: {q\,ui) (< 72 , “ 2 ) if? 

3u'i,u'2 G IV", Ui > u[, {qi,u[) — ^ {q 2 ,u' 2 ), and 162 > U 2 - 
The weak transition relation induces corresponding notions of runs, successor 
and predecessor functions defined by considering the weak transition relation 
instead of — >. 

Definition 2. We order vectors of natural numbers by {ui , . . . , u„) < (ui, . . . , u„) 
iffWi G {1, . . . ,n}. Ui < Vi. 

Given a set S C IV", we denote by min{S) the set of minimal elements of S 
w.r.t. the relation <. 

Let S C IV”. Then, S is upward (resp. downward j closed iffVu G IV". u G 
S ^ (Vt) G iV". V >u (resp. v < u) v € S). Given a set S C IN’^ , we denote 
by S'! (resp. S'J.J the upward (resp. downward) closure of S, i.e., the smallest 
upward (resp. downward) closed set which contains S. 



Lemma 3. Every set S C IV” has a finite number of minimal elements. A set 
is upward closed if and only if S = min{S)(. The union and the intersection of 
two upward (resp. downward) closed sets is an upward (resp. downward) closed 
set. The complement of an upward closed set is downward closed and vice-versa. 



Definition 4 (Simple constraints, upward/downward closed constraints). 

Let X = {xi, . . . , Xn} be a set of variables ranging over IN . 

1. A simple constraint over X , SG for short, is any boolean combination of 
constraints of the form x > c where x & X and c G iV U { 00 }. 

2. An upward closed (resp. downward closed j constraint over X, UG (resp. 
DG) for short, is any positive boolean combination of constraints of the form 
X > c (resp. X < c) where x € X and c G IV U { 00 }. 

Constraints are interpreted in the standard way as a subset of IV” (< is the 
usual ordering and < is the strict inequality). Given a simple constraint we 
let 1^] denote the set of vectors in W" satisfying f. Notice that the constraints 
a: < 0 and x > 00 correspond to 0 and that a: > 0 and x < 00 correspond to IN. 

Definition 5. A set S is SG (resp. UG, DG) definable if there exists an SG 
(resp. UG, DG) f such that -S' = |^]. 
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Definition 6 (Normal forms). 

1. A canonical product is a constraint of the form (. < x < u, 

2. A canonical upward closed product is a constraint of the form (■ < x, 

3. A canonical downward closed product is a constraint of the form x <u, 

where t G iV" and u G {IN U {oo})”. 

A SC (resp. UC, DC) in normal form is either 0, or a finite disjunction of 
canonical (resp. canonical upward closed, canonical downward closed) products. 

Lemma 7. Every SC (resp. UC, DC) is equivalent to a SC (UC, DC) in normal 
form. 

Proposition 8. SC definable sets are closed under boolean operations, and UC 
definable sets as well as DC definable sets are closed under union and intersec- 
tion. The complement of a UC definable set is a DC definable set and vice-versa. 
A subset of IN'^ is UC definable (resp. DC definable) if and only if it is an up- 
ward (resp. downward) closed set. A set is SC definable if and only if it is a 
boolean combination of upward closed sets. 

Let S = {S, X, Q, S) be a n-dim VASS with Q = {qi, . . . , qm}. Then, every set of 
configurations of S is defined as a union C'={(7i}xS'iU---U{(7m}x S'm where 
the Si’s are sets of n-dim vectors of natural numbers. The set of configurations 
C is SC (resp. UC, DC) definable if all the Sfs are SC (resp. UC, DC) definable. 
We represent SC definable sets by simple constraints in normal form coupled 
with control states. From now on, we consider a canonical product to be a pair 
of the form {q, I < x < u) where q G Q. A simple constraint is either 0 or a finite 
disjunction of canonical products. We use SC(Q, X) (resp. UC(Q, X), DC(Q, X)) 
to denote the set of simple constraints (resp. upward closed, downward closed 
constraints). We omit the parameters Q and X when they are known from the 
context. 

3 Computing Successors and Predecessors 

Lemma 9. The class SC is effectively closed under the operations post and pre 
for any lossy VASS’s. 

Proof. These operations are distributive w.r.t. union. Hence, it suffices to con- 
sider separately each transition r = {q, a, A, q') and perform them on canonical 
products: 

1. postr{{q,i < X < u)) = {q',x < u -\- A). 

2. prer{{q',i < x < u)) = (g, (£ — A) FI 0 < x), 

where Vt6, v G iV”, □ t; is the vector such that Vf G {!,..., n}. {u □ v)i = 

max{ui,Vi). □ 

Notice that for lossy VASS’s, the pre image of any set of configurations is upward 
closed and its post image is downward closed. This also holds for pre* and post* . 
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Theorem 10. For every n-dim lossy VASS S, and every n-dim SC set S , the 
set pre*{S) is UC definable and effectively constructible. 

Proof. Since the set pre*{S) is upward closed, by Proposition 0 we deduce that 
it is UC definable. The construction of this set is similar to the one given in 
fCf'uib|i7nTmn] for lossy channel systems. □ 

Theorem 11. For every n-dim lossy VASS S , and every n-dim SC set S , the 
set post* (S) is DC definable and effectively constructible. 

Proof. Since post*{S) is downward closed, by Proposition 0 we deduce that 
post*{S) is DC definable. This set can be constructed using the Karp-Miller 
algorithm for the construction of the coverability graph jKMbbj . □ 

4 Automata and Automata Logic 

We use finite automata to express properties of computations. These automata 
are labeled on states and edges as well. State labels are associated with predicates 
on the configurations of a given system and edge labels are associated with the 
actions of the system. 

Definition 12. Let A and S be two finite alphabets. A labeled transition graph 
over {A, E) is a tuple Q — {Q,qinit, n,S) where Q is a finite set of states, qinit 
is the initial state, II \ Q ^ A is a state labeling function, SQQxExQ is a 
finite set of labeled transitions. We write q — > q' when {q,a,q') G 5. 

Civen a state q, a run of Q starting from q is a finite or infinite sequence 
qoaoqiaiq 2 . . . such that qo = q and Vi > 0. qi — U 



Definition 13 (Automata on finite sequences). A finite-state automaton 
over {A, E) on finite sequences is a tuple Af = (Q, qinit, n, S, F) where {Q, qinit, 
77, i5) is a labeled transition graph over {A, E), and F C Q is a set of final states. 
A finite sequence AoOoAiai . . . A„ € A{EA)* is accepted by Af if there is a run 
qoGoqiai . . .qn of Af starting from qinit such that Vi G {0, . . . , n}. n{qi) = \, 
and qn G F. Let L(Af) be the set of sequences in A{EA)* accepted by Af. 

Definition 14 (Biichi w-automata). A finite-state Biichi automaton over 
{A,E) is a tuple Aui = {Q, qinit, n, S, F) where {Q, qinit, n, 5) is a labeled tran- 
sition graph over {A, E), and F C Q is a set of repeating states. An infinite se- 
quence AoCoAiai . . . A„ G {AE)'^ is accepted by A^j if there is a run qouoqiai . . . 

OO 

of Aui starting from qinit such that Vi > 0. II{qi) = Xi, and 3 i > 0. qi & F. We 
denote by L{AA) the set of sequences in (AE)‘^ accepted by Au,. 

Definition 15 (Closed w-automata). A closed uj-automaton is a Biichi au- 
tomaton Auic = {Q, qinit, 77, <5, F) such that F — Q. 
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Remark. ITM Biichi automata define tu-regular sets of infinite sequences. 
They are closed under boolean operations. Closed w-automata define closed w- 
regular sets in the Cantor topology (the class F in the Borel hierarchy). They 
correspond to the class of w-regular safety properties. Closed w-automata are 
closed under intersection and union, but not under complementation. 

We introduce an automata-based branching-time temporal logic called AL (Au- 
tomata Logic). This logic is defined in the spirit of the extended temporal logic 
ETL and is an extension of ECTL* |TCT| . The logic AL is more expressive 
than CTL and CTL*, and allows to express all oo-regular linear-time properties 
on finite and infinite computations. 

Definition 16 (Automata Logic). Given a set of control states Q and a set 
of variables X , we let T denote a subset of SC{Q, X), and we let tt range over 
elements of T . Then, the set of AL(F) formulae is defined by the following 
grammar: 

ip :■.= TT \ \ ip\J if \ if A ip \ 3A/(<P1, . . . , Pm) I VA/((^ 1 , . . . , Pm) \ 

{^pi, • • • , Tto) I VAcj ( 9^1 j - . . ^ Pm) 

where Af (resp. A^j) is a finite-state automaton on finite (resp. infinite) se- 
quences over {A = {Ai, . . . , Am}, L"). We consider standard abbreviations like 



Definition 17. We use * to denote f or oj. Let S — {E,X,Q,S) be a n-dim 
(lossy) VASS, We define a satisfaction relation between configurations of S and 
AL(F) as follows: 

s ^ (<Z,C) iff State(s) = q and Val{s) G |^] 
s'^^piffs'^p 
s'^ Pi \J P 2 iff s'^ Pi or s'^ p2 
s \= Pi f\ p 2 iff s \= Pi and s \= p 2 

s 1= 3Ai,{pi, ■ • • , Pm) iff^P = sooo . . . G i?un*(s,5). 3cr = Ai„ao . . . G L(A). 

|cr| = IpI andVj. 0 < j < |p|. Sj ^ pi. 

s 1= VA(v5i, • ■ • , Pm) iff^P = soao . . . G i?un*(s,5). 3cr = Ai(,ao . . . G L(A) 

|cr| = IpI andVj. 0 < j < |p|. Sj ^ pi. 

For every formula p, let :={sG5|s \= p}- 

Definition 18 (Fragments of AL). 3AL(F) is the fragment of AL that uses 
only constraints from T , conjunction, disjunction and existential path quantifi- 
cation. yAL(F) is the fragment of AL that uses only constraints from T , con- 
junction, disjunction and universal path quantification. Let X be (some fragment 
of) the logic AL. Then Xf (resp. denote the fragment of X where only 

automata on finite sequences (resp. Biichi, closed uj-automata) are used. 
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AL is a weaker logic than the modal ^-calculus, but many widely known tem- 
poral logics are fragments of AL. Every propositional linear-time property, in 
particular LTL properties, can be expressed in AL. CTL* is a fragment of AL 
since every path formula in CTL* corresponds to an LTL formula. Thus, CTL 
is also a fragment of AL. Clearly, VAL and 3AL subsume the positive universal 
and existential fragments of CTL* denoted VCTL* and 3CTL* (notice that LTL 
is a fragment of VCTL*). 

We consider two fragments of CTL called EF and EG. The logic EE uses SC 
predicates, boolean operators, the one-step next operator and the operator EF 
which is defined by = pre*{\(p\), The logic EG is defined like EF, except 

that the operator EF is replaced by the operator EG, which is defined as follows: 
s ^ EGip iff there exists a complete run that starts at s and always satisfies 
By a complete run we mean either an infinite run or a finite run ending in a 
deadlock. We use the subscripts / or oj to denote the fragments of these logics 
obtained by interpreting their formulae on either finite or infinite paths only. 
Then, it can be seen that EF = EF/ C CTL/ C CTL)1 C AL/. It can also be 
seen that EG^j is a fragment of AL,^c but EG is not (due to the finite paths). 

5 Model Checking 

Definition 19 (Model checking and global model checking problems). 

1. The model checking problem is if s G for configuration s and formula 
T- 

2. The global model checking problem is whether for any formula tp the set 

is effectively constructible. 



Lemma 20. Let S be a lossy VASS. Then for every formula ip of the form 
3A/(7Ti, . . . ,7Tm) where all the iVi are SC, the set is SC definable and effec- 
tively constructible. 

Proof. By a generalized pre* construction (see Theorem MIA) . □ 



Theorem 21. The global model checking problem for lossy VASS and the logic 
ALf is decidable. 

Proof. By induction on the nesting-depth and Lem.m.a, WTk. □ 

The following results even hold for non- lossy VASS. The aim is to show decidabil- 
ity of the global model checking problem for VASS and the logic 3AL,^(f7C'). We 
define a generalized notion of configurations of VASS which includes the symbol 
uj. This symbol denotes arbitrarily high numbers of tokens on a place. It is used 
as an abbreviation in the following way: {q, (cu, cu, . . . ,oj, Xk+i, . . . , Xn)} |= p : 
3m, ...,Uk € IN. {q, (ni,ri 2 , . . .,Uk,Xk+i,. . . ,Xn)) h T- (Of course the lo 
can occur at any position, e.g. {q,{xi,X 2 , 0 J,XA,io,XQ)).) 
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Lemma 22. Let S be a VASS and ip a formula of the form ■ ■ ■ ,T^m) 

where all the tti are in UC. Let s be a generalized configuration of S (i.e. it can 
contain uj). It is decidable if s \= ip. 

Proof. (Sketch) First construct the Karp- Miller coverability graph \K M Then 
check for the existence of cycles in this graph that have an overall positive effect of 
the fired transitions. These cycles may contain the same node several times. This 
check is done with the help of Parikh’s Theorem. The property holds iff such a 
cycle with overall positive effect exists, because it can be repeated infinitely often. 

□ 



Lemma 23. Let S be a VASS and ip a formula of the form 3 Aui(ti, • ■ • ,T^m) 
where all the iTi are in UC. The set is UC definable and effectively con- 
structible. 

Proof. is upward closed, because all iTi are upward closed. Thus, it is char- 

acterized by the finite set of its minimal elements (see Lemma To find the 
minimal elements, we use a construction that was described by Valk and Jantzen 
in wim . The important point here is that we can use Lemma to check the 
existence of configurations that satisfy ip. For example, if {q,{uj,X 2 ,xf)) ^ ip 
then we can check if {q, (ni,X 2 ,X 3 )) |= ip for ni = 0, n\ = 1, ni = 2, ... until 
we find the minimal n\ s.t. {q,{ni,X 2 ,xf)) \=ip. □ 



Theorem 24. The global model checking problem is decidable for VASS and the 
logic 3AL^(J7C'). 

Proof. By induction on the nesting-depth of the formula and Lemma □ 



Theorem 25. The global model checking problem is decidable for lossy VASS 
and the logic 3AL(UC). 

Proof. By induction on the nesting depth and Theorems \^and\^ □ 



Theorem 26. The model checking problem for lossy VASS and AL,^c is decid- 
able. 

Proof. By induction on the nesting-depth of the formula and an analysis of all 
computations which is finite by Dickson’s Lemma. □ 



Theorem 27. Model checking lossy VASS with the logic EG is decidable. 

Theorems EEl and El say that the model checking problem is decidable for a 
lossy VASS and an EG-formula/ AL^c-formula ip. However, in both cases the 
set is not effectively constructible (although it is SC definable). If it were 
constructible then Lemma EDI could be used to decide model checking lossy VASS 
with formulae of the form EFEGojTt, where tt is a constraint in SC. However, 
this problem has very recently been shown to be undecidable. 
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Proposition 28. Model checking lossy VASS with formulae of the form 
EFEGi^tt, where tt is a constraint in SC is undecidable. 

Proof. This is a corollary of a more general undecidability result for lossy BPP 
(Basic Parallel Processes), which follows (not immediately) from the result on 
lossy counter machines in Proposition]^ (see IMayySf ). □ 

Remark. This undecidability result also implies undecidability of model check- 
ing lossy VASS with the logic 3AL,^. One can encode properties of the form 
EFEGuiT^ in in the following way: Let be an automaton with states 

q,q' , and transitions q ^ q, q ^ q' and q' — > q' which are labeled with any 
action. The predicate true is assigned to q and the predicate tt is assigned to q' . 
q is the initial state and q' is the only repeating state. Let be an automaton 
with only one state q which is the initial state and repeating and a transition 
q ^ q with any action. The predicate tt is assigned to q. Then for any lossy 
VASS s we have s \= EFEGu^rr s \= Aoj{true,Tr) V A^(7r). 

Lossy VASS can be extended with inhibitor arcs. This means introducing tran- 
sitions that can only fire if some defined places are empty (i.e. they can test for 
zero). Thus lossy VASS with inhibitor arcs are equivalent to lossy counter ma- 
chines. Normal VASS with inhibitor arcs are Turing-powerful, but lossy VASS 
with inhibitor arcs are not. 

Theorem 29. For lossy VASS with inhibitor arcs 

1. the global model checking problem is decidable for the logic ALf. 

2. model checking is decidable for the logics AL,^c o,nd EG. 

Inhibitor arcs can never keep a transition from firing, because one can just loose 
the tokens on the places that inhibit it. However, after such a transition has 
fired, the number of tokens on the inhibiting places is fixed and known exactly. 
Such a guarantee is impossible to achieve in lossy VASS without inhibitor arcs. 
Thus not all results for lossy VASS carry over to lossy VASS with inhibitor arcs. 

Proposition 30. Let S be a lossy VASS with inhibitor arcs. It is undecidable if 
there exists an initial configuration s s.t. there is an infinite run of(s,S). 

Proof. This is a corollary of a more general undecidability result for lossy counter 
machines in \May98\ . The main idea is that one can enforce that lossiness occurs 
only finitely often in the infinite run. □ 

Theorem 31. Model checking lossy VASS with inhibitor arcs with the logic LTL 
is undecidable. 

Proof. We reduce the problem of Proposition]^^ to the model checking problem. 
We construct a lossy VASS with inhibitor arcs S' that does the following: First 
it guesses an arbitrary configuration s of S doing only the atomic action a. 
Then it simulates S on s doing only the atomic action b. Let Aui be a Biichi- 
automaton with initial state q and repeating state q' and transitions q q, q q' 
and q' q' . Let s' be the initial state of S' . We have reduced the question of 
Proposition ]3I1 to the question if {s', S') |= 3 Aui {true, true). This question can 
be expressed in LTL. □ 
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It follows immediately that model checking lossy VASS with inhibitor arcs with 
ALi^{UC) is undecidable. It is interesting to compare this result with Proposi- 
tion EE For undecidability it suffices to have either inhibitor arcs in the system 
or downward closed constraints in the logic. One can be encoded in the other 
and vice versa. The set post*{s) is DC definable since it is downward closed. 
However, it is not constructible for lossy VASS with inhibitor arcs (unlike for 
lossy VASS, see Theorem ITTll . 

Theorem 32. post*{s) is not constructible for lossy VASS with inhibitor arcs. 

Proof. Boundedness is undecidable for reset Petri nets PEsns). This result car- 
ries over to lossy reset Petri nets. Lossy VASS with inhibitor arcs can simulate 
lossy reset Petri nets. It follows that boundedness is undecidable for lossy VASS 
with inhibitor arcs and thus post*{s) is not constructible. □ 

6 Conclusion 

We have established results for normal VASS and lossy VASS with inhibitor arcs 
(lossy counter machines). Interestingly, it turns out that these two models are 
incomparable. Moreover, all the positive/negative results we obtained for lossy 
VASS with inhibitor arcs are the same as for lossy fifo-channel systems. Note 
that lossy fifo-channel systems can simulate lossy VASS with inhibitor arcs, but 
only with some additional deadlocks. 

The following table summarizes the results on the decidability of model check- 
ing for VASS, lossy VASS with test for zero, lossy VASS and lossy fifo-channel 
systems. By ‘-I— k’ we denote the fact that for any formula (p the set |tp] is SC 
definable and effectively constructible (global model checking), while ‘-I-’ means 
that only model checking is decidable. We denote by — that model checking is 
undecidable. The symbol ‘?’ denotes an open problem. 



Logic 


VASS 


Lossy VASS+0 


Lossy VASS 


Lossy FIFO 


AL//EF 


- |Fso97l 


+-t- 


H — h IAJ93I 


++ lAjym 


3AL^{UC)/LTL 


-k-|-/-l-||Esp97| 


— 


-k-k 


— IIAT9BI 


3AL{UC) 


7 


— 


-k-k 


— [IA.T9fij 


AL^,/EG 




-k 


-k IA,)93I 


+ lAJ 931 


3AL^/CTL 


— jEK95j~ 


— 


— 


— IIATMI 



The results in this table are new, except where references are given. For normal 
VASS and LTL, decidability of the model checking problem was known |Esp97| , 
but the construction of the set |(/?] is new. The results in are just about 

EF and EG formulae without nesting, not for the full logics ALf and AL^c- 
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Abstract. Let G = (V, E, w) be an undirected graph with nonnegative 
edge weight. For any spanning tree T of G, the weight of T is the total 
weight of its tree edges and the routing cost of T is y~!.. drlu, v), 
where (1t{u,v) is the distance between u and v on T. In this paper, 
we present an algorithm providing a trade off among tree weight, rout- 
ing cost and time complexity. For any real number a > 1 and an integer 
1 < A: < 6a — 3, in +n^) time, the algorithm finds a spanning tree 

whose routing cost is at most (1 + 2/ (k -\- 1)) a times the one of the mini- 
mum routing cost tree, and the tree weight is at most (/(fc) -|- 2/(a — 1)) 
times the one of the minimum spanning tree, where f{k) = 1 if fc = 1 
and f{k) = 2 if fc > 1. 

Keywords: approximation algorithms, network design, spanning trees. 



1 Introduction 

Constructing spanning trees of graphs is a classical network design problem. 
Typically, we are given a graph G = (V,E,w), where w is a nonnegative edge 
weight function. The weight on each edge represents the distance and reflects 
both the cost to install the link (building cost) and the cost to traverse the 
link after the link is installed (routing cost). If we only consider the building 
cost, we are looking for a spanning tree T = (V,Et) with minimum tree weight 
W{T) = EeGiSt w{e), i.e., the minimum spanning tree (MST) of the graph. If 
only the routing cost is considered, the goal is to find the spanning tree T with 
minimum c(T) = Su ■ugt d,T{u,v) is the distance (total weight 
of the path) between u and v on T. Such a tree is called the minimum routing 
cost spanning tree (MRCT) (or the shortest total path length spanning tree) of 
the graph. Although both MST and MRCT tend to use light edges, a tree with 
small weight may have a large routing cost and vice versa. For instance, we can 
easily construct a graph such that the routing cost of its MST is 0{n) times the 
routing cost of its MRCT. Similarly, a spanning tree with a constant times the 
minimum routing cost may have a tree weight as large as 0{n) times the weight 
of MST. Therefore, we often need to make a trade off between the two costs. 

The minimum spanning tree is a fundamental problem and efficient polyno- 
mial time algorithms were developed (for example, see 03|)- The MRCT prob- 
lem is a special case of the optimum communication spanning tree problem |Sj. 
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Unfortunately, the MRCT problem has been shown to be NP-hard mni, and a 
2-approximation algorithm was presented in 0 . Recently, a polynomial time ap- 
proximation scheme (PTAS) for the MRCT problem was presented in 0. In this 
paper, we present an algorithm for finding a spanning tree that simultaneously 
approximates both the two costs with constant ratios. 

Let MRCT{G) and MST{G) denote the MRCT and the MST of G respec- 
tively. We define the light approximate routing cost spanning tree (LART) of G 
as follows: 

Definition 1: For a > 1 and /3 > 1, an (a,/3)-LART is a spanning tree T of G 
with c{T) < a X c{MRGT{G)) and w{T) < /3 x w{MST{G)). 

The main result of this paper is stated in the following theorem. 

Theorem 1: Given a graph G, a ^/(fc) -I- ^j^^^-LART can be con- 
structed in n^) time for any real number a > 1 and an integer 

1 < A: < 6a — 3, where /(fc) = 1 if fc = 1 and /(fc) = 2 if fc > 1. 

Several results for trees realizing trade off between weight and distance re- 
quirements can be found in the literature. In 0, an algorithm for construct- 
ing an (a,/3)-LAST {light approximate shortest-path tree) was presented. For 
any v G V{G), an (a,/3)-LAST rooted at z; is a spanning tree T of G with 
dr{i, v) < ctdcii, v) for any vertex i G V (G) and w{T) < j3w{M ST{G)). Khuller 
et al. showed that it is possible to construct an (a,l -|- 2/ (a — 1))-LAST for any 
a > 1. We used their algorithm as a kernel subroutine in our algorithm. 

Considerable work has been done on the spanner of a graph recently. In 
general, a t-spanner of G is a low-weight subgraph of G such that, for any two 
vertices, the distance on the spanner is at most t times the distance on G. 
When the spanner is restricted to a spanning tree, it is called a tree t-spanner. 
Obviously, a tree t-spanner has a more strict distance requirement than the 
routing cost considered in this paper. However, because of the strict distance 
requirement, some graphs do not have a tree t-spanner for any constant t (e.g. 
a cycle with identical weight on each edge). Some methods for finding spanners 
of a weighted graph were presented in [J- 

We briefly outline the basic ideas for finding an (o;,/3)-LART of a graph as 
follows. 

1. Although the input graph of our problem is a general graph, we show that 
it can be reduced to the one with a metric input. In |S|, an algorithm was 
developed for transforming a spanning tree of G to a spanning tree of G 
without increasing the routing cost, where G is the metric closure (defined 
later) of G. By observing that the algorithm does not increase the weight of 
the tree as well, we show that the problem of finding an (a,/3)-LART of G 
can be reduced to that of finding an (a,/3)-LART of G. 

2. A /c-star is a spanning tree with at most k internal nodes. By showing that 

there exists a fc-star with routing cost at most ( 1 + ) times the one 
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of MRCT{G), the PTAS in 0 finds the minimum routing cost fc-star in 
0{n?^) time, in which n is the number of vertices. However, the weight 
of the minimum fc-star may be large. Consider the case when k = 1. The 
minimum 1-star is a shortest-path tree rooted at some vertex. The weight of 
the minimum 1-star may be 0{n) times the weight of the minimum spanning 
tree. 

3. Consider the algorithm in [7| for constructing an (a,l + 2/ {a — 1))-LAST 
rooted at a vertex. We show that the LAST with minimum routing cost is 
a (2a, 1 -I- 2/ (a — 1))-LART. To find a LART achieving more general trade 
off, we use fc-stars. Let i? be a vertex set containing the k internal nodes of 
the fc-star with approximate routing cost of the MRCT{G). If R is given, 
we can construct a light approximate shortest-path forest with multiple roots 
R by the algorithm in [Z|. Then we combine the forest into a tree T by 
adding the edges of the minimum spanning tree of R. We show that it is a 
(2 -I- ^j^^^-LART. Finally, since R is unknown, we try all possible 
subsets containing k vertices. 

The remaining sections are organized as follows: In Section 2, some definitions 
are given. The reduction to metric input is shown in Section 3 and the algorithm 
is described in Section 4. The performance ratios are shown in Section 5. 

2 Preliminaries 

In this paper, a graph G = (P, E, w) is a simple, connected, undirected graph, in 
which w is a nonnegative edge weight function. For any graph G, V{G) denotes 
the vertex set and E{G) denotes the edge set of G. We shall use n to denote the 
number of vertices in the input graph. We first give some definitions below: 

Definition 2: A metric graph C? is a complete graph, in which the edge weights 
satisfy the triangle inequality. 



Definition 3: Let G = (V,E,w) be a graph, we use w{G) — 
denote the graph weight. By SPc{u,v), we denote a shortest path between u 
and V on G, and dc{u, v) = w{SPg{u, v)) is the shortest path length. The routing 
cost of a tree T is defined by c(T) = uGy(r) v). 



Definition 4: Let G be a graph. The metric closure of G, denoted by G, is the 
complete graph with V{G) = V{G) and w{u,v) = dciu^v) for any u,v, where 
w is the edge weight function of G. 



Definition 5: Let G = (V,E,w) be a graph and r G V. A spanning tree T is 
a shortest-path tree of G rooted at r if dT{r,v) = dG(r,v) for each v gV. 
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Definition 6: Given a graph G, a minimum routing spanning tree of G, denoted 
by MRCT{G), is a spanning tree T such that c(T) is minimum among all possible 
spanning trees. 

The following definition and lemma are given in |7|. 

Definition 7: Let G = {V, E, w) be a graph and r G V. For a > 1 and /3 > 1, 
an (a,/3)-LAST rooted at r is a spanning tree T of G with dT{r,v) < adcir^v) 
for each v gV and w{T) < f3 x w{MST{G)). 



Lemma 2 (^): Let G be a graph with nonnegative edge weights. For any 
r G V{G) and a > 1, there exists an algorithm, named FIND-LAST, which can 
construct an (o,l -I- 2/ (a — 1))-LAST rooted at r in 0(n) time if the minimum 
spanning tree and the shortest path tree of G are given. 



3 Reduction to Metric Input 

We now show that the problem of finding a LART of a graph can be reduced 
to that of finding a LART of its metric closure. The reduction is done by a 
transformation algorithm in 0. It was developed for the MRCT problem, and 
can be shown that it also works for the LART problem. 

Let G = (R, E, w) and G be its metric closure with edge weight 6. Any edge 
(a, b) in G is called a bad edge if (a, 6) ^ if or w{a, b) > S(a, b). For any bad edge 
e = (a, b), there must exist a path P = SPc{a, b) ^ e such that w{P) = S{a, b). 
Given any spanning tree T of G, the algorithm can construct another spanning 
tree Y without any bad edge such that c{Y) < c{T). Since Y has no bad edge, 
S(e) = w(e) for each e G E{Y), and Y can be thought as a spanning tree of G 
with the same routing cost. The algorithm is listed in the following. 



Algorithm Remove_bad 

Gompute all-pairs shortest paths of G. 
while there exists a bad edge in T 

Pick a bad edge (a, h). Root T at a. 

/* assume SPG{a,b) = {a,x, ...,b) and y is the father of x*/ 
if b is not an ancestor of x then 

y* = T U (x, b) \ (a, 5); L** = F* U (a, x) \ {x, y); 

else 

Y* = T\J (a, x) \ (a, b); Y** =Y*U (6, x) \ (a;, y)\ 

endif 

if c(y*) < c{Y**) then Y = Y* else Y = F** endif 
T = Y- 

endwhile 
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The algorithm computes Y by iteratively replacing the bad edges until there 
is no bad edge left. It was shown in that the cost is never increased at each 
iteration and it takes no more than O(n^) iterations. The result is stated in the 
following lemma. 

Lemma 3 ([0|): For any spanning tree T of G, it can be transformed into a 
spanning tree T of G in O(n^) time and c(T) < c(T). 

The main result of this section is summarized in the following lemma: 

Lemma 4: If there is an algorithm for finding an (o;,/3)-LART of a metric graph 
with time complexity t(n), then there is an algorithm for the (a,/3)-LART of a 
general graph with time complexity 0(n^ + t(n)). 

Proof: Given any graph G, we first construct its metric closure G in O(n^) 

time. We then find an (o:,/3)-LART T of G in t(n) time. Finally, using the 
transformation algorithm, we obtain a spanning tree T of G. 

By Lemma 01 c(r) < c(r). Since Lemma |3 is true for any spanning tree, 
it also implies that c{MRCT{G)) < c{MRCT{G)). Furthermore, it is easy 
to see that c{MRGT{G)) > c{MRGT{G)). It follows that c{MRGT{G)) = 
c{MRGT{G)), and we have 

c(T) < c(f) < ac{MRGT{G)) = ac{MRGT{G)) 

Since, in each iteration, the transformation algorithm does not increase the tree 
weight either, we have w{T) < w{T). It is easy to show that w{MST{G)) = 
w{MST{G)). Thus, 

w{T) < w{f) < (3w{MST{G)) = /3w{MST{G)) 

Therefore, T is an (o:,/3)-LART of G. □ 

By Lemma 0 we can focus on the problem with a metric input. In the rest 
of this paper, we shall assume the input graph G is a metric graph. 



4 The Algorithm 



In this section, we present the algorithm for finding a LART and analyze its 
time complexity. 

Definition 8: Let G = (R, E, w) be a graph (not necessarily a metric), R C V, 
and r ^ V. We define G“*"'^ = (C/, F, xc), in which U = V U {r}, F = E U 
{(r, u)|Vu S i?}, w(e) = w(e) Ve G F and w{e) = 0 Ve G F \ F. 
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The algorithm for finding a LART is listed below. 

Algorithm FIND-LART 

Input: a graph G, a real number a > 1, and an integer 1 < k < 6a — 3. 

Output: a (/(fc) + ^^^-LART of G, 

where f{k) = 1 if A: = 1 and f{k) = 2 if fc > 1. 

Step 1: Find Tm = MST{G). 

Step 2: For each R C V{G) and \R\ < k, use the following method to 
construct a spanning tree, and keep T with minimum c(T). 

Step 2.0: Assume R = {rj|l < j < q} and En = {(r,rj)|l < j < q}. 

Step 2.1: Construct G~^^. 

Step 2.2: Find MST{G+^). 

Step 2.3: Find the shortest path tree of G+'^ rooted at r. 

Step 2.4: Call algorithm FIND-LAST to find an (a,l -|- 2/(o; — 1))-LAST 

rooted at r. 

/* Let the tree be Ti. We can assume Efi C E{Ti) 
since w{e) = 0 Ve G E^. */ 

Step 2.5: Delete the edges Eji from Ti. 

Step 2.6: Find the Tq = MST{G\n), where G\n is the induced subgraph 

with vertex set R. 

Step 2.7: SetT = ToUTi. 

Step 2.8: Compute c(T). 

In the algorithm, Step 1 is executed for the sake of time efficiency. With the 
results of Step 1, Step 2.2 can be done in linear time (shown later). All the 
steps within Step 2 can be easily done in O(n^) time by direct methods. Our 
goal is to show that they can be done in 0{n) time. To achieve this goal, we 
only need to show that Step 2.2, 2.3 and 2.8 can be done in 0{n) time. The 
others are trivial. 

We first focus on MST{G^^). The following property of MST will be used: 
Let H he a, graph and Vl, V 2 be any partition of V{H). For any edge e crossing 
the cut (i.e. with one end point in Vi and the other in V 2 ), e belongs to a MST 
of H if and only if e is the lightest edge crossing the cut j^. 

Lemma 5: Let H = (V,E) be a graph and Y = (V,Ey) = MST{H). Assume 
e ^ E and Hi = {V, E U {e}), Yi = {V, Ey U {e}). Then a MST of Yi is also a 
MST of Hi and can be found in linear time. 

Proof: For any edge (x,y) of MST(Hi), by the property of MST, it is the 

lightest edge crossing some cut (14, 1^) of Hi. However, Hi is obtained by insert- 
ing an edge e into H. Thus, either (a;, y) = e or {x, y) is the lightest edge crossing 
the cut ( 14 , 14 ) of H. This implies (x, y) G E{Yi) and then MST{Hi) C Yi. We 
have w{MST{Yi)) < w{MST{Hi)). Furthermore, since Yi is a subgraph of Hi, 
w{MST{Hi)) < w{MST{Yi)). It follows that w{MST{Hi)) = w{MST{Yi)). 
Since there is only one cycle in Yi, MST{Yi) can be found by deleting the heav- 
iest edge in the cycle, which can be done in linear time. □ 
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Since |i?| < fc, the next lemma can be shown by iteratively applying Lemma 
0and we omit the proof. 

Lemma 6: MST{T^^) is a minimum spanning tree of and can be found 
in 0{kn) time. 

The next two lemmas show the time complexities for constructing a shortest- 
path tree and for computing the routing cost of a tree respectively. The proofs 
are omitted here. 

Lemma 7: The shortest path tree of rooted at r can be found in 0{kn) 
time. 



Lemma 8: For any tree T, c(T) can be computed in 0{n) time. 

Now, the time complexity of the algorithm can be easily shown by Lemmas 

13 0 Q and 0 

Lemma 9: The time complexity of Algorithm FIND-LART is 

Lemma 0 and Lemma 0 prove the time complexity in Theorem 0 We prove 
the ratio bounds of the weight and routing cost in the following section. 

5 The Performance Analysis 

In this section, we analyze the weight and the routing cost of the tree T con- 
structed by algorithm FIND-LART. The following lemma can be easily proved 
by Lemma 0 and observing that w{MST{G\ii)) < w{MST{G)). The proof is 
omitted. 

Lemma 10: w{T) < (^f{k) + w{MST{G)), where f{k) = 1 if /c = 1 and 
f{k) = 2 if A: > 1. 

To show the bound on the routing cost, we first introduce the <5-spine of a 
tree, which is defined in 0 . We shall use the 5-spine to derive a lower bound on 
c{MRGT{G)) and then show the approximation ratio of the routing cost. 

5.1 The 5 Spine of a Tree 

The 5-spine was introduced in 0. For the completeness of this paper, we describe 
some necessary definitions and results here. 

Definition 9: Let T be a spanning tree of G, and let S' be a connected subgraph 
of T. For a positive number 5 < 1/2, S is a 6 -separator of T if \V{B)\ < Sn 
for every connected component B in the induced subgraph T\y(^x)\v(S)- A 5- 
separator S is minimal if any proper subgraph of S is not a 5-separator of T. 
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Definition 10: Let T be a spanning tree of G, and let S' be a connected 

subgraph of T. Deleting the edges in E{S) from T will result in several subtrees. 
We use VB{T, S, i) to denote the vertex set of the subtree containing i. 

Definition 11: Let T be a tree and P = SPt{u,v) in which \VB{T, P,u)\ > 
\VB{T,P,v)\. We define = \VB{T,P,u)\, P'’ = \VB{T,P,v)\, and P‘^ = 
n-\VB{T,P,u)\-\VB{T,P,v)\. 

Definition 12: For a tree T and 0 < 5 < 0.5, a i5-path of T is a path P such 
that P‘^ < 8n/2. 



Definition 13: Let 0 < i5 < 0.5. A 5-spine Y = {P\^ P 2 , Ph\ of T is a set 
of pairwise edge-disjoint i5-paths in T such that S = lJi<i<?i Pi ^ minimal 
(5-separator of T. We define the cut and leaf set CAL{Y) of a (5-spine Y to be 
the set of the endpoints of the paths in Y. In the case that Y is empty, CALfY) 
is defined to be the vertex which is the minimal (5-separator. 

It should be noted that for any pair of distinct paths Pi and Pj in the spine, 
they either do not intersect or, if they do, the intersection point is an endpoint 
of both paths. This can be easily shown by definition. The next lemma is given 
in |S|, which shows the existence of the 5-spine with small cut and leaf set. 

Lemma 11 my- For any constant 0 < 5 < 0.5, and spanning tree T of G, 
there exists a 5-spine T of T such that \CAL(Y)\ < [2/5] — 3. 

5.2 The Routing Cost 

The following lemma is immediate and we omit the proof. 

Lemma 12: For any spanning tree T of G, c(T) = 2 e“e^?c(e). 

Let da{v,U) denote minu^u{dG{v,u)} for any v G V{G) and U C V{G). We 
now derive a lower bound on c{MRCT{G)). 

Lemma 13: Let T be a 5-spine of a spanning tree T of G, and let S — Upgf P 
be a minimal 5-separator of r. Then c(r) > 2(1 — 5)n ^ dT{v,V{S))+25{l — 

vGV{G) 

S)n‘^w{S). 

Proof: By Lemma M we have 

c{T)/2= e“e'’u>(e)= ^ e“e'’w(e) + ^ e‘^e^w{e) (1) 

eeE{T) eeE{T)\E(S) eeE{S) 

By observing that 

dT{v,V{S))= dT{v,V{S))^ e’’w{e) 

vdV{G) vaV(G)\V(S) edE{T)\E{S) 
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and e“ > (1 — S)n for each e G E{T) \ E{S) , we have 

e°'e’’w(e) > (1 — S)n dT{v,V{S)) (2) 

eeE{T)\E{S) veV(G) 

For each e G E{S), since e°’ > > 5n and e“ + e^ = n, we have e“e^ > S{l — 5)n^. 

Then 

e°‘e^w{e) > (5(1 — S)n^w{e) = (5(1 — S)n'^w{S) (3) 

eeE(S) eGE{S) 

By Equations CO), o, and o, we obtain 

c(T)/2> (l-(5)n dT(?;, 1^(5)) +5(1 -5)n2u>(S') 

DGy(G) 

□ 

Let T be the tree output by algorithm FIND-LART. The following lemma 
shows the ratio of the routing cost of T and completes the proof of Theorem d 

Lemma 14: For any integer k < 6a — 3, c(T) < |^o;c(r*), where T* = 

MRCT{G). 

Proof: Let 5 = 2/(/c + 3). By Lemma there must exist a 5-spine Y of T* 

such that |CAL(y)| < k. Let T be the tree constructed by algorithm FIND- 
LART with R selected in Step 2, in which i? = {ri|l < 5 < g} = CAL(Y). Such 
a tree T always exists since we try all possible vertex subset with no more than 
k vertices. Since c(T) < c(T), we only need to prove the approximation ratio of 
T. In the following, Tq and r are defined as in Section 4. Let V = V(G) = V(T). 
By Lemma o 

c{T)/2= e°‘e^w{e)= e“e^?«(e) + e°‘e^w{e) (4) 

eGS(T) edE{T)\E(To) eeE{To) 

Similar to Equation (2D in Lemma [□ and observing that e“ < n, we have 

e°'e^w{e) < n''^ driv, R) ( 5 ) 

eeE{T)\E{To) vGV 

For each v G V(G), by Lemma El dT{v,r) < adc{v,r) = adciVjR). Since 
driv^R) = driver), we have driv^R) < adciv^R). Thus, 

n driv, R) < an dc{v, R) (6) 

vev vGV 

Now turn to the second term in Equation Since e“e^ < n^/4 for any edge e, 
e°‘e^w{e) < (n^/4) w{e) = (n^/4)w(To) 

eG£^(To) eG£^(To) 



( 7 ) 
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By Equations 0), ©, ©, and o, we have 

c(T)/2 < an ^ dciv, R) + (n^/4) w{Tq) (8) 

vev 

Now let us consider T*. Let Y = {Pi|l < i < h} and S = Pi be the 

minimal (5-separator. Assume Pi = SPt» {ui, Vi), and let Vi = V\V~B\T* , Pi, Ui)\ 
VB{T* ,Pi,Vi) for any i = l,2...h, and Vq denote Ei3(T, S', u). We 

have da{v,CAL{Y)) < dT*{v,V{S)) for any v € Vq, and dG{v,CAL{Y)) < 
dT»{v,V{S)) + w{Pi)/2 for any v € Vi hy triangle inequality. By the definition 
of (5-spine, \Vi\ = P[ < 5n/2. Therefore, by Equation ( 0 , we have 

c(T)/2 < an ^ daiv, R) + (n^/4) w{Tq) 
vev 

= an ^ da{v, CAL{Y)) + (n^/4) w{To) 
vev 

< an I ^ dr*(t^,E(S)) -h ^ {Sn/4)w{P^)\ + {n"^ /A) w{Tq) 

Y'uGy l<i<h J 

< an dr* (v, V (S)) -I- (a(5n^/4) w(S) + (n^/4) w{Tq) 

vev 

Since Tq is a minimum spanning tree on G\r and S is a tree spanning V (S) D 
R, we have w(To) < w{S). Then, 

c(T)/2 < an'^dT^{v,V{S)) + {aSn"^ /4 + n"^ /A) w{S) (9) 

vev 



By Lemma IT!^ and Equation c(T) < max{j^, 4 ^fz^}c(T*). Note that 
a > 1 and 0 < (5 < 1/2. Let g{S) = max{ } . When S > l/(3a), 

g(S) decreases as S decreases from 1/2 to l/(3a). When 
<5 < l/(3a), g{5) increases as 5 decreases from l/(3a). 

Therefore, g{5) reaches its minimum when 5 = l/(3a). Since 5 = 2/(k + 3), we 
conclude that c(T) < |^ac(T*), for any k < 6a — S. □ 
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Abstract. We study a problem related to finding shortest paths in 
weighted graphs. We ask whether or not there is a path between two 
nodes that is of a given cost. The edge weights of the graph can be both 
positive and negative integers, or even integer vectors. We show that most 
variants of this problem are NP-complete. We also develop a pseudo- 
polynomial algorithm for the case where the edge weights are integers. 
The running time of this algorithm is 0{M^N^ + |w| min(|w| ,M)N'^) 
where N is the number of nodes in the graph, M is the largest absolute 
value of any edge weight, and w is the target cost. The algorithm is based 
on preprocessing the graph with a relaxation algorithm to eliminate the 
effects of weight sign alternations along a path. 



1 Introduction 



Finding shortest paths in weighted graphs is one of the most central problems 
in graph algorithms, with plenty of applications. The problem is now well- 
understood, with several polynomial-time solution algorithms |3l Chap. 25-26]. 

Here we study a related problem. Instead of finding a path with minimum 
cost we ask whether or not there is a path between two nodes that is of a given 
cost. We formulate the problem in a general setting, in which each edge has a 
weight that is a fc-vector of (both positive and negative) integers as follows. 

Let G be a weighted directed multi-graph with set of nodes V (G) and set of 
edges E{G). Every edge e = p ^ q G E{G) carries a cost vector weight(e) = 
w &7 j^. The cost of a path 



V 



Wi W2 W3 

PO ^ Pi > P2 > 



•Wm 



^ Pm. 



( 1 ) 



is cost(T^) = the vector sum of all the edge costs along V. We restrict 

attention to paths, whose length m is not zero. Our problem is as follows. 

Definition 1. Given two nodes p,q € V^G) from G, and a target cost vector 
w, the FIXED DISTANCE PROBLEM (FDP) is to determine if there is a path 
from p into q with cost exactly w in G. 

* Supported by Academy of Finland grant number 42977. 
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We will show FDP to be NP-complete. We also show it to remain NP-complete 
even for scalar weights, i.e., when the costs are integers. Our main result is a 
pseudo-polynomial time algorithm for the scalar version. The algorithm prepro- 
cesses G in time O(M^N^) to account for sign alternations using a relaxation 
technique. Here N is the size of V{G) and M is the largest absolute value of 
any edge weight in G. Obviously, a polynomial time algorithm is obtained if 
M is a constant. This happens for example when the weights are “trits” in 
T={-1,0,+1} (cf. H). 

In addition, the same approach enables us to solve in pseudo-polynomial 
time the problem of finding a path with smallest absolute cost between given 
two nodes. This algorithm runs again in time 0{M‘^N^). 

We would like to mention a couple of motivations for studying these problems. 
The first one comes from the analysis of multi-tape automata. In png a query 
language for string databases has been developed such that string processing 
is carried out by two-way multi-tape finite state automata. It is important to 
be able to determine whether or not a given query has a finite answer [3 I I 4] . 
This requires determining whether or not a two-way multi-tape automaton can 
loop by moving back and forth on its input tapes while producing output. The 
answer to the query is finite if there are no such loops. A sufficient condition 
for this can be obtained using the transition graph of the automaton, but ignore 
the tape contents. For a fc-tape automaton this graph is a weighted graph whose 
weights are in T^. The sufficient condition to be tested is that the graph has no 
loops with cost 0. Our second motivation stems from computational molecular 
biology, namely from the DNA sequence assembly problem. This leads in its 
classical form into the shortest common superstring problem widely studied by 
the algorithmic community Chap. 16.17]. However, sometimes we know a 
priori the target length for the sequence to be constructed. This then reduces to 
the problem of finding a path of fixed length. 

Finally note that if FDP is further restricted to only those paths V such that 
each prefix sum Si = X)j=i constrained to lie in the positive orthant (that 
is, no component of any Si can be negative), then it becomes the reachability 
problem in vector addition systems or Petri nets nmra. 

2 FDP Is NP-Complete 

In this section we show our problem to be NP-complete. 

Theorem 1. FDP is NP-hard. 

Proof. Let us reduce the NP-complete INTEGER LINEAR PROGRAMMING 
(ILP) problem [EJ Chap. 13] into FDP. An ILP instance (in standard form) 
consists of m linear equations ai,iXi -I- • • • -I- ai^nXn = bi with integer coefficients. 
The problem is then to determine whether or not this group of equations has a 
nonnegative integer solution. 

Our reduction generates a graph G with two nodes p and q. Let Aj denote 
the column vector (aij, . . . , amj) for each j = 1, . . . , n. G consists of a loop 
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p — ^ p for each Aj, and a bridge p — > q. We ask whether this G 
has a path from p into q with weight 0 . Clearly the equations have a solution 
xi = Cl, . . . , = c„ if and only if G contains a path from p into q with cost 0 

Aj 

such that the loop p — > p is used exactly Cj times, for j = 1, . . . , n. □ 

Let sgn(?c) C T denote the sign of a given integer w S Z. It is shown next 
that restricting the weight vectors into does not help in general. (However, 
Corollary 0 below shows that the case of scalar weights from T belongs to P.) 

Corollary 1. FDP remains NP-hard, even if all weight vectors belong to T^. 

Proof. Consider the proof of TheoremCJ Recall that ILP is NP-complete even in 
the strong sense [3 Problem MPl and Chap. 4.2]. That is, there is a polynomial 
h so that already those ILP instances X, whose numbers have absolute value at 
most ^.(length of X), form an NP-complete language. Call this language SILP. 

SILP can be reduced to our more restricted problem as follows. Perform 
first the reduction given in the proof of Theorem ^ Then repeatedly replace 

an edge a b having some Vj ^ T with a new node c and a path 

(sgn(i)i),... ,sgn(i),„)) (i>i-sgn(i;i),... ,i;,„-sgn(i;m)) , ... i j 

a !■ c > 0 until no such edges remain. 

The existence of h ensures that this remains polynomial with respect to the size 
of the SILP instance. □ 

Theorem 2. FDP belongs to NP. 

Proof. The difficulty in this proof lies in the fact that the length of the path 
witnessing the desired cost need not be of polynomial length with respect to the 
size of the problem instance description: in for example the graph with edges 
2 '^ —1 

p — > p and p > p even the shortest paths from p back into itself with cost 0 

have lengths at least We circumvent this difficulty by utilizing an implicit 

representation for these paths. Intuitively, this representation is the number of 
times each edge occurs on a path; technically, this is accomplished with an ILP. 
We propose the following nondeterministic four-step algorithm: 

1. Add into E{G) a new edge f = q p (even if E{G) already contains such 
an edge). 

2. Guess a subgraph F[ of G, which contains / and forms a (not necessarily 
maximal) strongly connected component. 

3. Form the following ILP instance. There is one variable Ce for every e G E{Fl). 
There are no other variables. The constraints are as follows. 

(a) The variable c/ is constrained to be exactly 1. 

All other variables are constrained to be at least 1. 

(b) Add for every node r eV{H) the constraints EeGin(r) Ce = Egsout(r-) Cg; 
here in(r) (out(r), respectively) consists of the edges entering (exiting, 
respectively) r in FI . 

(c) Add finally the constraints w = J2eeE(H) ’ weight(e). 

4. Guess and verify a solution to the ILP instance constructed in Step 0 
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Assume for specificity that G is given by listing E{G) explicitly. StepsdtoOlcan 
be performed in nondeterministic polynomial time with respect to the size of 
the problem instance description. Thus the generated ILP is also of polynomial 
size, and step 0 can also be performed in nondeterministic polynomial time. 

Let us then verify the correctness of this algorithm. Assume first that G does 
have a path V from p into q with cost exactly w. Denote as Q the closed loop 
induced by joining the last node q of V into its first node p with a new edge / 
having weight (/) = 0. Then we show that choosing each Ce to equal the number 
of times edge e occurs within Q provides a solution to the ILP constructed in 
stepOl Let H consist of the nodes and edges that occur in Q. Constraints (13 a, D are 
satisfied by the choice of values for the Ce ■ Constraints are satisfied, because 
Q is a closed loop, and therefore every node r must be entered and exited an 
equal number of times. Finally, constraints (I3dl reduce to the requirement that 
cost (7^) = in by the choice of weight (/). 

Assume conversely that the algorithm succeeds, and construct a path V from 
p into q with cost(T^) = w as follows. By assumption, there is some subgraph 
H of G guessed in step|3 for which there exist values for the variables Ce that 
satisfy the constraints constructed in step0 Form a multi-graph D by taking Ce 
copies of each edge e G E{H). This D does not omit any edges in El because of 
constraints H3al> . On the other hand, El was chosen to be strongly connected in 
step El Therefore constraints (13 hi) ensure that D has an Eulerian tour Q. Remove 
from Q the sole occurrence of /. This leaves an Eulerian path V from p into q in 
D with cost(T^) = cost(Q) = to by constraints (E3), because weight(/) = 0 by 
stepE Path V is by construction also a path in H , and therefore in G as well, if 
instead of many copies of the same edge we use the same edge many times. □ 



3 Pseudo-Polynomial Time Algorithm for Scalar FDP 

Let us then consider FDP with edge costs restricted to scalars instead of vectors. 
At first sight, this does not seem to help: 

Theorem 3. FDP is ISfP-hard already for arbitrary scalar weights. 

Proof. Recall the NP-complete PARTITION problem [3 Problem SPI2]: given 
a list ui, . . . , Mm of nonzero natural numbers, determine whether the index set 
1 = {!,... ,to} contains a subset J such that This is 

easily encoded as a graph G with nodes E{G) = {poj • ■ • ,Pm} and edges V{G) = 
pi,pi-i — ^ Pi : 1 < * < m|, and asking whether there exists a path 
V with cost 0 from po into Pm- Then J corresponds to the numbers Uj for which 
the positive edge was chosen into V instead of the negative one. □ 

However, we develop a preprocessing stage, which yields a pseudo-polynomial 
solution to this sub-case. Preprocessing negative edge costs has already been 
suggested by Bertsekas |3 Sect. 4.2], but in a more restricted setting than ours. 
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3.1 Sign-Relaxation 

Our preprocessing stage performs a certain relaxation operation on the given 
graph G with respect to the signs of its edge weights. The goal is to eliminate 
the need to consider paths, along which the edge signs change. Compared with 
other relaxation methods for path problems ^ Chap. 25.2-25.3], our method 
adds new “short-cut” edges instead of estimating path lengths. 

Definition 2. Graph H is a sign-relaxation of graph G if and only if the fol- 
lowing holds for all p,q G V{G): G has a path V from p into q if and only if H 
has a path Q from p into q such that 

1. cost(Q) = cost(T^), and 

2. sgn(weight(e)) = sgn(cost(Q)) for all edges e that appear in Q. 

Efficient computation of these sign-relaxations would imply P = NP by the 
proof of Theorem 0 The problem is that computation on the edge weights is 
restricted to be polynomial with respect to the lengths of their representations, 
rather than their actual values. Thus we apply instead the following concept, 
which allows for adding these values, and leads therefore into the aforementioned 
pseudo-polynomiality. 

Definition 3. The sign-closure unsign(G) of a graph G is the closure of E{G) 
with respect to the following edge addition rule: if p q ^ r G E(unsign(G)) 
and sgn(?ii) ^ sgn(ri), then also p r G E(unsign(G)). 

Example 1. Consider the graph G™ with E(G™) = |pi-i Pi- 0 < z < n| U 
|p„_i Po,Po — ^ Po,Po ^ Poj, where m,n gK 

1. The graph unsign(G™) contains all edges po — ^ pi for z = 1, 2, 3, . . . , n — 1 
by applying the edge addition rule of Definition 0 into po — ^ pt-i pi. 

2. Then from p„_i po — ^ Pn-i the rule introduces edge Pn-i — ^ Pn-i- 
Step 0 can then be repeated with Pn-i in the role of po- In this way the 
complete graph K~^ of n nodes and edges costing —1 emerges as a subgraph 
into unsign(G™). 

3. Step O can be repeated for the edge cost m instead of —1, producing Kff. 

4. The rule application on po — ^ po — > Po introduces the edge po - ^> po, 
and then Step 0 can be applied for the edge cost m — 1, producing 

This step can even be repeated for costs m — 2, m — 3, m — 4, . . . ,0. 



Hence unsign (G™) is the complete graph of n nodes and all edge costs in the 
range —1,... ,m. Thus |if(unsign(G™))| = (m -|- 2)n^ even though |if(G™)| = 
rz -I- 2. ■ 
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1: H^G-, W ^ E{G)\ 

2 : while VE 7 ^ 0 do 

3: Delete an arbitrary p g € IE; 

4: for all {q ^ r G E{H)) A (sgn(n) 7 ^ sgn(ty)) A (e = p r ^ E{H)) do 

5: E{H)^ E{H)u{e};W ^WU{e} 

6 : end for 

7: for all (r ^ p G E{H)) A (sgn(i)) 7 ^ sgn(w)) A (e = r g ^ E{H)) do 

8 : E(Jf)^£;(7f)u{e}; TE^lEU{e} 

9: end for 

10 : end while 

Fig. 1. A sign-closure algorithm. 

The algorithm in Fig. Q computes H = unsign(G). Assume for its analysis 
that V{G) = {!,... ,A^}, and denote as M = max||u| : a ^ b G E{G)'^ the 
magnitude of the edge weights. Assume further that these weights contain both 
positive and negative values. Observe finally that the edge weights in unsign(G) 
lie in the same range as in G. 

Our computational model is a Random Access Machine (RAM), which can 
perform arithmetic on integers of magnitude max(M, N) in constant time. We 
now prove that our preprocessing stage is pseudo-polynomial and correct. 

Theorem 4. The algorithm in Fig. Q runs in time O(M^N^). 

Proof. Let us consider the following implementation. Represent H as an N x N 
adjacency matrix A of integer lists, such that the list A[p, q\ contains w if and only 
if p q G FI. Moreover, these lists are kept in strictly ascending order. The max- 

imum length of any such list is 2M + 1 by our earlier observation. Then the first 
interior loop in steps 4-6 run in time 0{MN) by list merging: 
for r <— 1 , . . . ,N do 

Let list L contain those elements of A[g,r] with a different sign than w, in 
order; 

Add w into every element in L; 

Merge L into A\p, r] so that whenever a new element u is to be added, add 
also the corresponding edge p ^ r into W 

end for 

The second interior loop in steps 7-9 runs also in time 0{MN) with the same 
technique. On the other hand, an edge is added into W exactly when it is added 
into the graph under construction. Therefore W can be implemented with for 
example a list without the need to worry about duplicates, yielding 0(1) time 
insertion and deletion operations. The outer loop in steps 2-10 is executed at 
most times by the same reason. (It is executed f2{MN'^) times in 

Example d) □ 

Theorem 5. Graph unsign(G) is a sign-relaxation of graph G. 
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Proof. We show H = unsign(G) to satisfy the equivalence in Definition [3 

The forward direction is shown as follows. Consider any path 7^ of G as in 
m, and consider those pairs pi-i pi — pi+i of consecutive edges along 
V such that sgn(wi) sgn(wi+i); in other words, those nodes pi, where the edge 
addition rule of Definition El could be applied. 

If there are no such pi, then choosing Q = V suffices: edge weight signs do 
not change along 7^, and E{G) C G(unsign(G)) by definition. 

Otherwise select a zenith z from these pi by further requiring that the ab- 
solute value I Si I of the corresponding prefix sum Sj mentioned in Sect. Q must 
reach a maximum. Then divide V into the prefix A from po into z, and the 
suffix B from z into Pm- A has now strictly fewer nodes, where the rule of Defini- 
tion Elcould be applied, and therefore inductively unsign(G) does have a path A' 
from po into z such that cost(^') = cost(^) and sgn(u) = sgn(cost(^')) for all 
edge weights u along A! . Similarly B has a corresponding path B' in unsign (G) 
from z into Pm such that cost(;B') = cost(;B) and sgn(n) = sgn(cost(;B')) for all 
edge weights v along B' . Moreover, sgn(cost(;B')) sgn(cost(yI')) yt 0 by the 
choice of z. Therefore the edge addition rule of Definition El applies in the po- 
sition a ^ z ^ h oi the combined path A!B' . Perform then the following rule 
applications: 

— If sgn(w + v) = sgn(cost(^')), then omit z from the path by adding the edge 

a b. The rule applications continue with b as the next zenith, unless 

b = Pm, in which case rule applications cease. 

— If sgn(rt -I- u) = — sgn(cost(^')), then symmetrically to the previous case, z 

is omitted by a b, but now a becomes the next zenith, unless a = po- 

— If sgn(rt -|- ti) = 0, then omit z as above with a — >5. Then a is a possible 
choice as the next zenith, unless a = pq. Similarly, b is another possibility, 
unless b = Pm- If neither is possible, rule applications cease. 

When these rule applications cease, the result is a path Q in unsign(G), such 
that cost(Q) = cost(M') -I- cost(K') = cost(T^) and sgn{w) = sgn(cost(Q)) for all 
weights w along Q, as required. 

The converse direction of the equivalence is a consequence of the following 
claim: if e = p — s- q G G(unsign(G)), then G contains a path V from p into q 
with cost(T^) = w. This claim is shown induction on the genealogy of e. 

If e G E{G), then V = e suffices. Otherwise e appeared into unsign(G) as 
the result of applying the rule of Definition El into some edge pair p ^ r ™ - "> q, 
which has already been added into unsign(G), and sgn(ri) y^ sgn(w — v). Then 
inductively G has a path A from p into r with cost(^) = v, and a path B from r 
into q with cost(;B) = w — v. Then path AB is also in G and cost(yI;B) = w. □ 

3.2 Solving the Scalar Version with Sign-Relaxation 

Here we use the sign-relaxation concept introduced in Sect . 1.3. 1 1 to provide algo- 
rithms for the scalar FDP. Our general method is to first use the algorithm in 
Fig.n and then post-process the result with known graph algorithms. 
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First we show that restricting the edge weights to be both scalars and in T at 
the same time yields a polynomial algorithm. Let n{N) denote the time required 
to multiply together two N x N Boolean matrices (where addition is disjunction 
and multiplication conjunction); currently fi{N) = 0. 

Corollary 2. FDP can be solved in 0{N^ + ^(7V) log 2 \w\) time for weights in 
T, in which w is the target cost. 

Proof. Compute first unsign(G) with the algorithm in Fig.Q this takes 0{N^) 
time by Theorem^ By our earlier observation, the resulting matrix A can be rep- 
resented with three NxN Boolean matrices Ag, where s S T, so that Ag [p, 9 ] = 1 
if and only if A[p^ q] contains s. 

By Theorem 0, the original graph G has a path from node p into node q with 
cost w 0 a and only if the graph (whose adjacency matrix is) ^sgn(tu) has a 
path from p into q with length Similarly for w = 0 we see that Aq must have 
a path from p into q, and this can be checked within the stated time bound. 

Whether Ag has a path from p into q of length £ > 0 can in turn be decided 
by returning A\[p^ 9], the appropriate element of the £th power of Ag. This power 
can be computed with 0(log2 £) matrix multiplications by processing £ bit by 
bit, as is well known ^ Chap. 26.1]. □ 

The algorithm in the proof of Corollary El provides also our first pseudo- 
polynomial solution to the scalar version of FDP, in which the edge weights are 
no longer constrained to T: expand G into this unary form by adding at most 
2M — 2 new nodes for each original node in G, as in ^ Lemma 12]. 

However, when jwj is moderate with respect to M and N (like 0{MN), 
say), we can improve the factor of this first algorithm into by post- 
processing the sign-closure of the graph with a breadth-first search instead of 
matrix multiplication of its unary representation. 

Corollary 3. FDP can be solved in 0{M^N^ + jwjmin(jwj ,M)N'^) time for 
general scalar weights. 

Proof. The adjacency matrix A for unsign(G) can be constructed in 0{M^N^) 
time by Theorem^ If w = 0, then the problem reduces again to checking whether 
or not unsign(G) has a path of edges with weight 0 from node p into node q. 
Assume therefore w > 0, and consider the post-processing algorithm in Fig. El 
the case w < 0 is obtained by switching signs. 

Let F be the subgraph of unsign(G) consisting of the edges with sign -|-1. 
Its adjacency matrix, which is also denoted with F in the algorithm of Fig. El 
can be built from A in 0{MN^) time; the algorithm in Fig. Q] can even maintain 
pointers to the first elements of the suffix F[i,j] of each edge list A[f, j] to build 
F during the construction of A at no extra cost. By Theorem El the original 
graph G has a path from node p into node q with cost w if and only if one can 
be found in F already. 

The invariant of the main loop in steps 5-21 is as follows. List C\i] contains 
u if and only if u > 0 and there exists a path V in F from node p into node i 
with cost D + u < w such that the last edge in V has cost at least u. 
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1: D ^ 0; 

2: for all i ^ 1, . . . ,N do 

3: Let list C[i\ be the elements of list F[p, i] in the range 1, . . . , w in the same order 

4: end for 
5: while D < w do 

6: Let B consist of those indices i such that the hrst element 5 of list C[i\ is smallest 

among the first elements of all the nonempty lists in the array (7; 

7: if R = 0 then 

8: D ^ w 

9: else 

10: D + S; 

11: for all i G B do 

12: Remove the first element 5 from the list C[i\ 

13: end for 

14: for all j ^ 1, . . . ,N do 

15: Subtract 5 from every element in the list C[j]\ 

16: for all i G B do 

17: Merge into list C[j] all elements of list in range 1, . . . ,w — D 

18: end for 

19: end for 

20: end if 

21: end while 

22: Answer [D — w) f\{q G B) 

Fig. 2. A post-processing algorithm for FDP. 



This invariant is first established by the initializations in steps 1-4. Assume 
then that this invariant holds for the current D < w. li C\i] = %, then F has no 
paths from node p into node i with cost in the range / = {D + \ , + M} 
by the invariant, li B = 0, then accordingly F has no paths from node p with 
cost in I, and therefore F cannot have any paths from node p with cost more 
than D + M either. Hence the algorithm can justifiably reply ‘no’ in this case. 

Assume then B ^ By the invariant, i G B ii and only if F contains some 
path Vi from node p into node i with cost D + 5] moreover, there are no paths 
from node p with cost in the range D-l-1,... ,Zl-|-(5 — Iby the minimality of 5. 
The overall effect of the operations in steps 10-19 amounts to expanding each 
such path Vi with one additional edge having cost no greater than w — D — S, 
reinstating the invariant by moving S from the elements of the lists C[j] into 
D, and removing the leading zero elements that would have appeared whenever 
j G B. In particular, if D is no longer smaller than w, the main loop terminates: 
a path from node p into node q with cost w was found if and only if a path Vq 
with cost w was found. The reply of the algorithm is again justified. 

Each list C[i] will contain a strictly ascending sequence of numbers within 
the range 1, . . . , L = min(w, M), extracted by using the aforementioned suffix 
pointers. Therefore the list processing steps 15 and 17 take time 0{L). The 
construction of the set B in step 6 can be performed in time 0{N). Then the 
interior loop in steps 14-19 takes time 0{LN“^) and dominates the time spent 
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in the main loop in steps 5-21. This main loop is executed at most w times, 
leading to the stated time bound of 0{wLN^) for the entire algorithm. □ 

4 Minimum Absolute Cost Paths 

Let us now turn our attention from FDP into the following minimization prob- 
lem, which is readily solved with the sign-closure approach developed in Sect. 13. l1 



Definition 4. Given two nodes p,q € V (G) of a scalar graph G, what is the 
minimum absolute cost of any path from p into q in G, if any? 

This problem is similar to asking for the cost of the shortest path from p into q in 
graphs with both negative and positive edge weights p 00 mil Chap. 25.3]. 
However, finding a negative cost cycle warrants responding — oo in the shortest 
path problem, whereas our problem is “walking the tight-rope” by balancing the 
positive and negative contributions as well as possible. 

Corollary 4. The problem in Definitional can be solved in time O(M'^N^). 

Proof. By Theorem0 the original graph G has a path V from node p into node q 
with cost w if and only \i H = unsign(G) has a corresponding path Q from p into 
q with cost w via edges having the same sign as w. Assume furthermore that jwj 
is minimal among all the paths V' from p into q with sgn(cost('P')) = sgn(w), 
and let e = a ^ 6 be an edge in Q. Graph H cannot contain another edge 
f = a ^ b with sgn(u) = sgn(w) but |z;| < juj, because path Q', which is as 
Q, except that / is taken instead of e, still has sgn(cost(Q')) = sgn(w) but 
|cost(Q')| < |cost(Q)| = jwj. Then G contains a path V' corresponding to Q', 
which contradicts the minimality of jwj. Hence a path of minimum absolute cost 
with edges of a given sign must always choose the edges of minimum absolute 
weight with the correct sign in H . 

Compute therefore first H (that is, its adjacency matrix A) with the the 
implementation of the algorithm in Fig. 0 given in the proof of Theorem 0 
this takes 0{IvPN^) time. Construct similarly to the proof of Corollary 0 three 
matrices Ag, s G T, where As[p, q] = min {|m| : sgn(w.) = s A rt is in list A[p, g]}. 
(If there are no such m, then Ag[p, q] = -l-oo.) By the reasoning above, it suffices 
to find shortest paths in the graphs (whose adjacency matrices are) Ag. 

Therefore this problem reduces to first checking if there is a path from p 
into g in Ao, and answering 0 if this is the case; or otherwise determining (with 
for example Dijkstra’s algorithm 0 Chap. 25.2]) the lengths £±i of the shortest 
paths from p into g in A±\. Then the answer is — £_i if £_i < f+i, and 
otherwise. (The answer -|-oo means there are no paths from p into g at all in G.) 

This post-processing can be implemented to take 0{N‘^) time. (Note also 
that using the adjacency matrix representation is natural, because the graph H 
is likely to be dense as the closure of the edge addition rule by Definition 0) □ 
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5 Conclusions 

We studied whether a directed graph has a path of exactly given total cost, 
where the edge weights can be negative as well as positive integers, or even 
vectors of such integers. Although the problem was shown to be NP-complete 
in most cases, we also gave pseudo-polynomial algorithms for some of these cases. 
Our algorithms were based on a relaxation algorithm to eliminate the effects of 
sign changes along the paths of the given graph. This relaxation algorithm was 
then used as a preprocessing stage, after which known graph algorithms were 
applicable, even though they do not normally cope with negative weights. 

Acknowledgment. The authors thank E. Mayr for suggesting the use of ILP 
in Theorem 0 
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Abstract. In this we paper we consider the version of the classical Tow- 
ers of Hanoi games where the game-board contains more than three pegs. 
For k pegs we give a 2*^'=" ' lower bound on the number of steps nec- 

essary for transferring n disks from one peg to another. Apart from the 
value of the constants Ck this bound is tight. 




1 Introduction 

“In an ancient city, so the legend goes, monks in a temple had to move a pile 
of 64 sacred disks from one location to another. The disks were fragile; only 
one could be carried at a time. A disk could not be placed on top of a smaller, 
less valuable disk. In addition, there was only one other location in the temple 
(besides the original and destination locations) sacred enough for a pile of disks 
to be placed there. 

Using the intermediate location, the monks began to move disks back and 
forth from the original pile to the pile at the new location, always keeping the 
piles in order (largest on the bottom, smallest on the top). According to the 
legend, before the monks could make the final move to complete the new pile in 
the new location, the temple would turn to dust and the world would end. ” ca 
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There is a single player game, called “The Towers of Hanoi,” originating from 
the above legend with the following rules: 

The game board has three pegs and n disks, which are originally arranged 
on the first peg, largest at the bottom and each disk sitting on a larger disk. 
The goal of the game is to transfer the n disks to another peg while observing 
the following conditions: 

1. In each step the topmost disk on a peg is removed and placed on the top of 
the disks on another peg. 

2. A disk cannot be placed on a disk smaller than itself. 



The game has become one of the most popular examples for recursive algo- 
rithms (see e.g. The unique shortest solution requires 2” — 1 steps. Since 

there is not much mathematical mystery left about the original game, its lovers 
developed various versions of it Q, i, i, m In this article we consider a ver- 
sion where the number of pegs is some fixed k > 3 (0) El)- Define h = k — 2 , and 
s as the unique integer for which < n < In 0 and |7| different 

algorithms are presented that require exactly 



ak{n) = 2 ® 






h + t — 1 
h 



( 1 ) 



disk moves when transferring the pile of disks from the first peg to the second peg. 
To understand the above formula a little better observe that Ofe(n) — akin — 1) = 
2®, where s is defined as above. This amount is perceived as the increase in the 
number of steps caused by the presence of the disk. (The formula appearing 
in Pj is equivalent to O) The order of magnitude of Ofc(n) is 2®'=" where 

Bk = il±o{l))ik-2y}/^^-^\ 

Our main result (Corollary PJl is a lower bound for the 

number of necessary moves to solve the k pegs version of the Towers of Hanoi 
game, which is optimal up to a constant factor in the exponent for fixed k. 



Table 1. The values of ak{n) for k < 5 and n < 10. 



k\n 


12 3 4 


5 6 7 


8 


9 


10 


3 


1 3 7 15 31 63 127 255 511 123 


4 


13 5 9 


13 17 25 


33 


41 


49 


5 


13 5 7 


11 15 19 


23 


27 


31 



2 Motivation 

The question about the optimal number of steps for the k peg variant of the 
Towers of Hanoi game was raised in the American Mathematical Monthly in 
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1939 (Problem 3918). The two solutions that it received in 1941 (|S|, ^0)) claim 
that the exact bound is afc(n), but the proofs are valid only for a very restricted 
class of algorithms. No unconditional lower bound has existed so far even though 
a few computer scientists have implemented algorithms for the problem (see e.g. 
E], m for a nice web version). Since only a minimal background is required 
to understand the topic, the design of such algorithms is and ideal project for 
interested students. 

The lower bound proof we present may also serve as a toy example for more 
involved complexity theoretic lower bound proofs, and its structure may be 
worthwhile to study for its own sake. 

3 The Lower Bound Proof 

Our lower bound will hold in a more general setup, namely when our only re- 
quirement is that between the initial and the final configuration each disk moves 
at least once. This generalization will allow us to use an induction. Getting the 
precise bound on the original problem might require a different kind of argument. 

Definition 1. For a game board with k pegs an arrangement ofn disks is called 
a configuration if it obeys the “smallest disk on top of the larger” rule. For a 
configuration D as above let g{D) be the minimal number of steps required to 
get every disk moved at least once, where all moves are taken according to rules 
1. and 2. of the introduction. Let us define g(ji,k) = mmo g(D) where D runs 
though over all possible configurations of n rings on a game board with k pegs. 

Remark 1. g{D) is finite for every configuration D. 

Proof. Since g{Do) is known to be finite, where Dq is the the configuration where 
all disks pile up on the first peg, it is enough to show that this configuration 
can be reached from every other configuration. We can show this by using an 
induction on n. □ 

Theorem 1. For k > 3 

Here the constants Ck depend on k in the following way: 

^ 1 f 12 

^ 2 1 ) ) 

Proof. We proceed by induction on k. 

Remark 2. We shall make a little effort to get the optimal lower bound for 
g{n, 3), even though it would be enough for us to show that g{n, 3) = 17(2"). We 
shall show that g{n, 3) > 2"“^ -|- 1. From the configuration where the largest and 
the second largest disks are around the first peg, and all other disks are around 
the second peg we can see that this bound is sharp. 
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The case k = 3.' We can suppose that n > 2. Consider an arbitrary initial 
arrangement of n disks. Let 5 be a sequence of the steps in which every disk 
moves at least once. Let j denote the first step that moves the largest disk. Let 
be the sequence of steps that proceed j, and let S 2 be the sequence of all 
steps following j. In the configurations before and after step j the disks other 
than the largest disk are piled up in a single tower. On the other hand, by our 
assumption, in either 5i or in S 2 the second largest disk (which is at the bottom 
of the pile before and after step j) should move at least once. By symmetry we 
can assume that this happens during the moves of S 2 ■ 

It is easy to see that then S 2 must contain a solution to the three peg Towers 
of Hanoi problem on the n — 2 smallest disks, so S 2 must be at least 2”“^ — 1 + 1 
long. Here 2”“^ — 1 comes from the lower bound for the classical game of Hanoi 
(see e.g. [6]) to which we can add one, because the second largest disk must also 
move. S contains at least one more step than ^2 (namely step j), so we obtain: 
g(n,3) > 2”-2 + 1. 

The case k > 4: First we prove a lemma which serves as the main lemma in our 
argument: 

Lemma 1. Suppose k > 4 and 0 < m < n/2k. Then: 

g{n, k) >2 min(g(n — 2km, k),g{m, k — 1)). 

Proof. We say that a sequence S of steps moves a set H of disks if for every 
h G H there exists a step in S made by h. Let us call Z (small disks) the set of 
the smallest n — 2km disks. 

Consider an arbitrary configuration D of the disks around k pegs. There is 
a peg around which we have at least 2m disks from the largest 2km disks. Let 
us call X (extra large disks) the set of the largest m disks in this peg. Let us 
call L (large disks) the set of the next largest m disks in this peg . Note that 
X and L depend on D while Z does not. Clearly Z, L and X are disjoint, 

I X 1=1 L |=m, I Z \= n — 2km. Moreover every disk in X is larger than any 
disk in L and they are all larger than any disk in Z . Consider a sequence of steps 
that moves all the disks starting from the initial configuration D. Define 5i as 
the initial sequence of steps up to (but excluding) the first step by the topmost 
(i.e. the smallest) disk of X, and let S 2 be the sequence of all remaining steps. 
Obviously moves L and ^2 moves X . Moreover if does not move Z then 
the peg on which an idle member of Z is sitting is completely useless for the 
disks in L since they are all larger than any of the disks in Z . 

This allows us in this case to estimate the number of steps made by the 
elements of L by g{m, k — 1). The same argument shows that S 2 either moves Z 
or only k — 1 pegs were used when making the steps with the disks in X. Since 
according to the above argument both and S 2 contain at least min(g(n — 
2km, k),g{m, k — 1)) steps the proof of the lemma follows. □ 



Let us denote log 2 g{n, k) by 4>{n, k). 
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Corollary 1. For k > 4 and 0 < m < n/2k 

4>{n, k) > 1 + min((/)(n — 2km, k), 4>(jn, k — 1)) 



holds. 



Lemma 2. k) is a monotone increasing function of n for any fixed /c > 3. 

Lemma El follows from the fact that extra disks just make the task harder. 
Lemma 3. Suppose k > 4 and k — 1) > i for i = 1, s. Then 

S 

(j){2k Hi, k) > s. 

i=l 

Proof. We proceed by induction on s. The s = 1 case is straightforward. For 
s > 2 : 

S S— 1 

(j){2k Hi, k) = (j)(2kns + 2k k) > 

s-1 

1 + Tf\\n{(j)(2k E ni,k),(j>{ns,k - 1)) > s. 

i=l 

The first inequality comes from the corollary of Lemma P the second comes 
from the induction hypothesis on s — 1 and the assumption of the lemma on 

4>{ns,k-l). □ 



Lemma 4. Suppose the Theorem holds for k — 1. Define Ui as the smallest 
element of the set {n \ k — 1) > i}. Then: 



E 



m<{\± o(l)) 



cfe-2 



{k-2)Cl-J 



Proof. Since according to our assumption the theorem holds for fc — 1 we have 
the (1 ± lower bound on 4>{n, k — 1). Combining this with the 

monotonicity of k — 1) in n we get the asymptotic upper bound on the 
integral of the inverse required by the lemma. □ 

Now we are ready to prove the Theorem ^ for k > 4 assuming that it is 
true for k — 1. Let us denote X)i=i by Ng. From Lemma 0 we have Ng < 
(1 ± o(l)) ■ On the other hand Lemma Olsays that <f>{2kNs, k) > s. for 

(fc— 2)C^_^ 

every s. These two inequalities provide us with an asymptotic upper bound on 
the inverse of 4>{n, k), which can be easily turned into the following asymptotic 
lower bound on 4>{n, k) using the monotonicity in n: 

4>{k,n) > (l±o(l))Cfc_i^(^^^)00-2)n'=-2. 

Calculation shows that Ck = □ 
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Corollary 2. The k peg version of the Towers of Hanoi problem requires at least 

2(l-o(l))C.nV('=-^) 
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Abstract. We consider dynamic evaluation of algebraic functions (ma- 
trix mnltiplication, determinant, convolution, Fourier transform, etc.) in 
the model of Reif and Tate; i.e., if f{xi , . . . , Xn) = (yi, . . . , y-m) is an alge- 
braic problem, we consider serving on-line requests of the form “change 
inpnt Xi to value u” or “what is the value of output yiT’ . We present 
techniques for showing lower bounds on the worst case time complexity 
per operation for such problems. The first gives lower bounds in a wide 
range of rather powerful models (for instance history dependent algebraic 
computation trees over any infinite subset of a field, the integer RAM, 
and the generalized real RAM model of Ben-Amram and Galil). Using 
this technique, we show optimal f2{n) bounds for dynamic matrix-vector 
product, dynamic matrix multiplication and dynamic discriminant and 
an n[^/n) lower bound for dynamic polynomial multiplication (convo- 
lution), providing a good match with Reif and Tate’s 0{\/n log n) up- 
per bound. We also show linear lower bounds for dynamic determinant, 
matrix adjoint and matrix inverse and an f2(y/n) lower bound for the 
elementary symmetric functions. The second technique is the communi- 
cation complexity technique of Miltersen, Nisan, Safra, and Wigderson 
which we apply to the setting of dynamic algebraic problems, obtain- 
ing similar lower bounds in the word RAM model. The third technique 
gives lower bounds in the weaker straight line program model. Using this 
technique, we show an 17 ( (log n)^/ log log n) lower bound for dynamic 
discrete Fourier transform. Technical ingredients of our techniques are 
the incompressibility technique of Ben-Amram and Galil and the lower 
bound for depth-two superconcentrators of Radhakrishnan and Ta-Shma. 
The incompressibility technique is extended to arithmetic computation 
in arbitrary fields. 

Due to the space constraints imposed by these proceedings, in this version 
of the paper we only present the third technique, proving the lower bound 
for dynamic discrete Fourier transform and refer to the full version of the 
paper which is currently available as a BRIGS technical report, for the 
rest of the proofs. 
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1 Introduction 

Reif and Tate mm considered the following setup of dynamic algebraic algo- 
rithms. Let /i , . . . , fm be a system of n- variate polynomials over a commuta- 
tive ring or rational functions over a field. We seek an algorithm, that, when 
given an initial input vector x = {xi,X 2 , ■ ■ ■ ,Xn) to the system, does some pre- 
processing and then afterwards is able to efficiently handle on-line requests of 
two forms: “chcinge;,(u): Change Xk to the new value v” and “query^: Return 
the value of output /^(x)”. Reif and Tate provided two general techniques for 
the design of efficient dynamic algebraic algorithms. They also presented lower 
bounds and time-space trade-offs for several problems. Apart from Reif and 
Tate’s work, we also meet dynamic algebraic problems in the literature on the 
PREFIX SUM problem fF7^ [7^^ rTFfTHl [FTM IHACOl) : the specific 

case of /i(x) = for i = 1, . . . , n. 

The aim of this paper is to present three techniques for showing lower bounds 
for dynamic algebraic problems. We use them to show lower bounds on the worst 
case time complexity per operation for several natural problems where Reif and 
Tate had no lower bounds or only lower bounds for the time-space trade-off. 



1.1 Problems Considered 

Given a commutative ring R, we look at the following systems of functions. 

MATRIX- VECTOR MULTIPLICATION : i?” +" i-^ i?". The first components 
of the input are interpreted as an n x n matrix A, the last n components are 
interpreted as an n- vector x, and Ax is returned. 

MATRIX MULTIPLICATION : i-^ i?" . The input is interpreted as two 

n X n matrices which are multiplied. 

CONVOLUTION : i— > The input is interpreted as two n- vectors x = 

(cco, . . . , Xn-i) and y = {yo , . . . , y-n-i), whose convolution is returned. That is, 
the i’th component of the output is zt = J2j+k=i ^jVk- 
2 

DETERMINANT : ii” i-^- R: The input is interpreted as a matrix, whose 
determinant is returned. 

MATRIX ADJOINT : i?" i-^- i?” is the function that maps an n x n ma- 

trix A into the corresponding adjoint matrix given by matrix ADJOINt(A)^ = 
(— 1)*+-^ det(Aji), where Aji denotes the (n — 1) x (n — 1) matrix resulting when 
deleting the j’th row and the i’th column from A. 

If fc is a field, matrix inverse : fc” i-^ fc” is the partial function that maps a 
nonsingular nxn matrix A into the corresponding inverse matrix A~^ . Note that 
for a nonsingular matrix, matrix inverse(A) = matrix ADJOINt(A). 

DISCRIMINANT : i?" i— > R: The discriminant of the polynomial for which the 
n inputs are roots is returned, i.e. 

DiSCRiMiNANT(a;i, . . . ,a;„) = - xj) 

i¥^3 
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SYMMETRIC : i?” All n elementary symmetric polynomials of the 

inputs are computed, i.e., the j’th component of the output is 

yj= n 

/C{l,2,...,n},|/|=j iel 

POLYNOMIAL EVALUATION : i— > i?. A vector (a;, oq, oi , . . . , o„) is mapped 

to oq + oia; + G2X^ + . . . + a„a;". 

Finally, the following problem is defined for any algebraically closed field k. 
Let w be a primitive n’th root of unity k, and let F be the n x n matrix F = 
The Discrete Fourier Transform dft : A:" A:", is the map x ^ _Fx. 

1.2 Models of Computation 

A pivotal issue when considering lower bounds is the model of computation. 
For dynamic algebraic problems, this issue is quite subtle; models can vary ac- 
cording to the algebraic domain (reals, integers, finite fields, etc.), the atomic 
operations allowed (only arithmetic operations or more general operations), and 
the possibility of influencing the control flow of the solution (to what extent is 
the sequence of atomic operations performed allowed to depend on the previous 
history of the algorithm). We prove lower bounds in the following models of 
computation. 

The straight line program model. This is the most basic model. Given the 
problem of dynamic evaluation of a function / : A;" i— > A:™, we assign a straight 
line program to each of the operations changej^, change2, . . ., change^, queryj^, 
query2, . . ., query^^. The programs corresponding to the change-operations take 
a single input x and have no output, while the programs corresponding to the 
query-operations have no input but one output. Each program is a sequence 
of instructions of the form yi ^ yj o yf^^ where o g {-I-, and yj and 
yk are either input variables, memory variables, or constants. We assume for 
convenience that we always initialize to some specific input vector and assign a 
corresponding initial value to each variable which appears somewhere in one of 
the programs. The complexity of a solution is the length of the longest program 
in the solution. 

History dependent algebraic computation trees. In the straight line program 
model, it is not possible for the algorithm to modify the sequence of atomic op- 
erations performed. In the history dependent algebraic computation tree model, 
we allow the algorithm to control the sequence in a strong way. First, instead 
of assigning straight line programs to operations, we assign algebraic computa- 
tion trees. As branching nodes, we do not just allow <-comparison (which only 
makes sense for certain fields), instead we allow branching according to arbitrary 
predicates of finite arity. Also, to each operation (such as change]^2) assign 
not one, but several (in fact infinitely many) algebraic computation trees: One 
for each history, where a history is every bit of discrete information the system 
has obtained so far; namely, the sequence of input variables that were changed 
and output variables that were queried, and the result of every branching test 
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made so far during the execution of the operations performed. When we exe- 
cute an operation, we find the tree corresponding to the current history and 
execute that. The complexity of a solution is the depth of its deepest tree. For 
an example of an algorithm where the added power of the history dependent 
computation tree over straight line programs seems necessary, see the algorithm 
for DISCRIMINANT in Section 1.4. 

Random access machine models. A very general way of defining RAM mod- 
els is outlined by Ben-Amram and Galil [BAG92] . Here, we will only give an 
informal discussion. A RAM has an infinite number of registers, indexed by the 
integers. It also has a finite number of GPU-registers with proper names. Each 
register contains an element of the domain of computation: if we consider com- 
putation over the reals, each register contains a real; if we consider computation 
over the integers, each register contains an integer. In any case, it is convenient 
if the integers (or at least a sufficiently large subset of the integers) is a subset 
of the domain of interest; this makes indirect addressing possible, an important 
feature of the RAM. The machine operates on the memory using a finite pro- 
gram containing the following kinds of instructions: direct and indirect reads 
and writes, conditional jumps and a finite number of atomic computational in- 
structions operating on the GPU-registers. Each instruction is executed at unit 
cost. When the domain of the registers is the set of integers and the atomic 
operations are -I-, — , *, we get the integer RAM. Another model of interest is the 
generalized real RAM IRAG92I . Here, the registers contain arbitrary reals and 
as atomic operations we allow any set of functions R'^ i— > R for a constant c, 
with the property that for each function there is a countable closed set C C R'^, 
so that the function is continuous in R'^ \ C. 

The word RAM |I’W9,3I II’W94I |Hag98| has a somewhat different flavor from 
the integer RAM and the real RAM. The integer RAM can be considered un- 
reasonably powerful, since it can handle arbitrary integers with unit cost. Then 
again, the user can give it any sequence of n integers as input and measure the 
complexity of the computation as a function of n. The word RAM is the result of 
relaxing the power of both parties, the algorithm and the user. The word RAM 
does computation on words, i.e. integers in {0, 1, . . . , 2^" — 1} for some parameter 
w, intuitively determined at compile-time. The RAM has registers indexed by 
{0, 1, . . . , 2“ — 1}; in particular, we assume w > logn, so that the input can be 
given in registers and read. The RAM can operate on words using a number of 
unit cost operations including addition, subtraction, multiplication, integer divi- 
sion, bitwise Boolean operations, and left and right shifts. The algorithm should 
be correct for any value of w > log n, but n, the number of words in the problem, 
should be the only variable appearing in the time bound. The word RAM has 
been extensively studied as a model for sorting and searching. The survey of 
Hagerup |Hag98| gives a good overview of this literature. When considered as 
a model for dynamic algebraic problems, the word RAM is appropriate when 
the function in question is a constant degree polynomial over the integers. This 
ensures that when the input is a vector of words, the output can be given in a 
constant number of words, i.e. we can at least report the output with unit cost. 
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For instance, dynamic matrix multiplication makes good sense in the word RAM 
model while we will not consider dynamic determinant in this model. 



1.3 Our Results 

We present three techniques for proving lower bounds for dynamic algebraic 
problems. The first technique is very robust. In particular, it holds under a wide 
range on assumptions about the algebraic domain and the operations allowed, 
and even if the algorithm is allowed to control the flow of computation in strong 
ways. The technique is closely related to the incompressibility technique of Ben- 
Amram and Galil |HAGlli>j . The second technique holds only for the word RAM 
model (where the first technique fails) . It is a modest extension of communication 
complexity techniques of Miltersen et al |lVllNbW9l^ . With the first and second 
technique we show 

Theorem 1. Any solution to dynamic matrix- VECTOR multiplication, ma- 
trix MULTIPLICATION, MATRIX ADJOINT, MATRIX INVERSE, DETERMINANT, 
POLYNOMIAL EVALUATION or DISCRIMINANT has worst casc Complexity Q{n) 
per operation and any solution to dynamic convolution or symmetric has 
worst case complexity per operation, in the following models of compu- 

tation: 

— Straight line programs over any fixed finite field (except for polynomial 
EVALUATION, DISCRIMINANT and SYMMETRIC j, with the allowed set of 
change-arguments being the field itself. 

— History dependent algebraic computation trees over any infinite field, with 
the allowed set of change- arguments being any infinite subset of the field. 

— The integer RAM (except for matrix inverse^, with the allowed set of 
change -arguments being any infinite subset of the integers, and the general- 
ized real RAM, with the allowed set of change -arguments being the reals. 

— The word RAM (except for matrix adjoint, matrix inverse, deter- 
minant, POLYNOMIAL EVALUATION, DISCRIMINANT and SYMMETRIC ), with 
the allowed set of change- arguments being the set of words. 

We should note that the lower bound for dynamic polynomial evaluation 
was also proved by Reif and Tate, though not for as wide a range of models as 
above. Reif and Tate present lower bounds for a number of other problems by 
reductions from polynomial evaluation; we can apply the same reductions 
to get the lower bounds in the wider range of models. 

We should also note that for certain models and certain of the above prob- 
lems, there is an easier way of showing the same lower bound. For instance, we 
can show a lower bound for dynamic matrix- VECTOR multiplication over 
the reals using arithmetic operations as follows: It is well known |Win67llWh?7n] 
that n X n matrices A over the reals exist so that computing x ^ Ax requires 
J7(n^) arithmetic operations. Now, given an alleged dynamic algorithm for dy- 
namic MATRIX- VECTOR MULTIPLICATION with complexity o{n) per operation, 
we can initialize the matrix input to this matrix. Then, we can evaluate Ax for 
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any given x using n change and n query operations, i.e., a total of o(n^) arith- 
metic operations, a contradiction. The same technique was, in fact, used by Reif 
and Tate to show the lower bounds of their paper (using the fact that explicit 
hard polynomials exist, rather than the fact that explicit hard matrices exist). 
However, this argument does not seem to generalize to show, for instance, the 
linear lower bound for straight line programs over a finite field (where matri- 
ces requiring J7(n^) arithmetic operations do not exist l!Sav74ll L nor to show any 
lower bound for the generalized real RAM or the word RAM. Also, our technique 
applies to a wider variety of problems in a uniform way. 

Our third technique is more fragile. It only works in the model of history 
independent straight line programs. A technical ingredient of the technique is 
the lower bound for depth-two superconcentrators by Radhakrishnan and Ta- 
ShmaEISna . With the third technique we show 

Theorem 2. Any solution to dynamic dft in the straight line program model 
over an algebraically closed field of characteristic 0, with change -arguments re- 
stricted to any infinite subset of the field, has worst case complexity 
l7((logn)^/loglogn) per operation. 

Due to the space constraints imposed by these proceedings, we shall in this 
paper only present our third technique, proving Theorem El and only for the case 
where the allowed set of input arguments is the entire field (and not some infinite 
subset). For a full account of the proof of Theorem 2, the proof of Theorem 1, 
and our first and second techniques, we refer the reader to the technical report 
BRIGS RS-98-11, available at www.brics.dk. 



1.4 Optimality (and Otherwise) of Results 

The lower bounds for matrix- vector multiplication and matrix multi- 
plication are tight, there are straightforward linear upper bounds. The lower 
bound for discriminant is also tight, there is a linear upper bound for any infi- 
nite field (see Theorem El) , and a straightforward constant upper bound for any 
finite domain in the straight line program model. Interestingly, the linear upper 
bound does not seem to be implementable in the straight line program model. 
The lower bound for convolution has a fairly good match in the 0{^Jn\ogn) 
upper bound of Reif and Tate !TO7| for the same problem. The upper and lower 
bounds for determinant, matrix adjoint, matrix inverse and symmetric 
are not tight, we don’t know any solution for determinant, matrix adjoint 
and matrix inverse better than evaluating queries from scratch, and we don’t 
know any better upper bound for dynamic symmetric than a (not quite obvi- 
ous) 0{n) upper bound (see Theorem 0. Reif and Tate show an 0{^/n) upper 
bound for dynamic dft which is valid in the straight line program model. This 
leaves a rather large gap between upper and lower bounds. 

Theorem 3. There is a computation tree solution of complexity 0{n) for dy- 
namic evaluation o/ DISCRIMINANT. The solution works over any field. 
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Proof. All the current inputs xi,...,Xn are maintained, and so is the set of 
their (distinct) values together with the number of occurrences in a structure 
L = {[ui, ni], . . . , [u|i|, n|i|]}, i.e. > 1 and = n. Finally, we maintain 

the (nonzero) discriminant of the distinct values: D = Vj). With this 

representation query is simple; if all nfs are 1, we return I), otherwise we return 
0. For change, we must update ZJ and L, which is easily done in linear time (see 
Figure ni). 



change^ (ii) : assume Xi = Vk for [vk, Ufc] £ i; if rife > 1 then rik := n*, — 1 
else D ~ - Vkf- L := L \ {[ufc, 1]}; 

ii V = vi for some [«;, n*] G L then n; := n; + 1 
else D ~ -vf-, L := L U {[«, 1]}; 

Xi := v; 



Fig. 1. Computation tree solution for discriminant. 

Theorem 4. There is a straight line program solution of complexity 0{n) for 
SYMMETRIC. The solution works over any commutative ring. 

Proof. All the current inputs x\, . . . ,Xn and corresponding outputs yi, . . . ,yn 
are maintained. This makes the straight-line program for query^ trivial; it needs 
only return yi. For the implementation of change, we observe that for any i,k, 
we have that yk = XiZk-i,i + Zki, where Zki does not depend on Xi, which makes 
the solution in Figure El valid. 



change,;(i;) : Zo := 1; 

for fc = 1 . . . n do 

Zk •— yk ^iZk — l\ 
Vk ■= Zk -f vzk-r, 
Xi ■.= V, 



Fig. 2. Straight line solution for symmetric. 



2 Lower Bound for Dynamic DFT 

Our technique is essentially based on the following incompressihility statement: If 
k is an algebraically closed field, a rational map fc" i— > k"^~^ can not be injective. 
Thus, it is closely related to the technique of Ben-Amram and Galil, who applied 
incompressibility in various domains to show a gap between the power of random 
access machines and pointer machines lEsung. 

First, a technical lemma stating a generalization of the above fact. Let k 
be an algebraically closed field. Recall that an algebraic subset VF C fc" is an 
intersection of sets of the form {x G A:"|p(x) = 0}, where p is a non-trivial 
multivariate polynomial. 
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Lemma 5 . Let k be an algebraically closed field. Let W be an algebraic subset 
of A:™ and let (j) = (/1/51, • ■ ■ , fn/ g-n) '■ k^ \ W ^ k^ be a rational map where 
fi, gi € , Xm] for i = 1 , . . . , n. Assume that there exists y € fc” such that 

is non-empty and finite. Then m < n. 

Using an approach similar to Valiant [WTHl . we shall apply the notion of 
a superconcentrator to get a setting where we may use the incompressibility 
lemma. The equivalence of the following to the standard definition is due to 
Meshulam IMes 84 l . An n- superconcentrator of depth 2 is a graph G with nodes 
X UV UY, where X,V and Y are disjoint, |A| = |V| = n, and with edges E C 
{X xV)Li{V X Y) such that for any I, for any X\ C X and for any Yi C V with 
|Ali| = |Yi| = I, we have |iV(Ai) niV(Yi)| > I, where N{Xi),N{Y\) C V denote 
the neighbors to Xi,Yi. Radhakrishnan and Ta-Shma Wm7\ proved that the 
number of edges in an n-superconcentrator of depth 2 is at least 

Let k be an algebraically closed field. Let / : fc" 1— > A:” be a function. Let 
X = {a;i, . . . , Xn} be the set of inputs, and let Y = {yi , . . . , y^} be the set of 
outputs. We say that / is super-injective, when for every I, for every X\ C X 
and for every Yi C V satisfying that |Ai| = |Yi| = ^ there is a G A:"“* such that 
/a : A:^ I— *■ k^ is injective, where /a denotes the function arising from specializing 
/ to the constants a on the inputs X \Xi and ignoring all outputs in Y \ Yi. 
Lemma 6. Let k be an algebraically closed field. Let f \ k^ kA be a super- 
injective polynomial function. From any family of straight line programs for dy- 
namic evaluation of f and of complexity d, we get an n- superconcentrator of 
depth 2 and with at most Sdn edges. 

Proof. From the dynamic solution for /, define a graph G as follows. The nodes 
of G is A U Y U Y, where V is the variables used in the dynamic solution for /, 
i.e. we may assume that V = {v\, . . . ,Vm}, where m < 2dn. The edges of G is 
E C (A X Y) U (Y X Y) and (xi,v) G E, if the program for change^ writes the 
variable v. Similarly, (v,yj) G E, if the program for query^ reads the variable v. 
Clearly, \E\ < 3dn. We shall argue that G is a superconcentrator. 

Let I be given, and let Ai C A, Yi C Y be given such that |Ai| = |Yi| = 1. 
Let Yi = A(Ai) n A(Yi). We need to argue that |Yi| > 1. (After permutation 
of indices) we may assume that Ai = {x\, . . . ,xi} and Yi = {yi, . . . ,yi}. Use 
the super-injectivity of / to choose a G k^~^ such that fa.', k^ ^ k^ is injective, 
where fa denotes the function arising from specializing the inputs (xj+i , . . . ,Xn) 
to the constants a = (ai, . . . , a„_;) and ignoring all the outputs (yi+i, . . . , yn)- 
From the dynamic solution for /, construct an off-line solution P = Pi; P2; P3 
for /a as follows 

Pi : change,_^fyai); • • • ; change„(a„_/) 

P2 : changefya;i); • • • ; change, (a;;) 

P3 ■■ Vi query;,; ■■■;yi query, 

Let Ai denote the values of the input variables Ai (before the execution of P2), 
let Yi denote the values of the variables Yi after the execution of P2 but before 
the execution of P3, and let Yi denote the values of the output variables Yi 
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(after the execution of P 3 ). Clearly, Vi is a rational function of Xi. Denote this 
rational function by g : 1 — > Similarly, Yi is a rational function of Vi, 

since the output does only depend on the input through the intermediate values 
Vi. Denote this rational function hy h : . We see that /a = hog. Since 

/a is injective, so must also g be injective, and by Lemma Elthis is only possible 

Lemma El and the lower bound for depth 2 superconcentrators of 
together implies the following lemma. 

Lemma 7. Let k be an algehraieally closed field. Let f : k"^ k'^ be a super- 

injective polynomial function. Any family of straight line programs for dynamic 
evaluation of f has complexity ■^( log^o^n )- 

It is obvious that a linear map is super-injective if and only if all minors of 
the corresponding matrix are non-zero. Thus, by Lemma 0 to show the lower 
bound for dynamic dft claimed in Theorem 0 for the case where the allowed 
set of change-arguments is the entire field, we just need to show that this is the 
case for a large x submatrix of the Fourier transform matrix. The 

following lemma accomplishes this and completes the proof of the special case 
of Theorem 0 

Lemma 8. Let k be an algebraically closed field of characteristic 0, let u & k 
be a primitive n’th root of unity, and let F = (aij) be the nx n discrete Fourier 
transform matrix with Oij = . 

Then F contains an I x I submatrix B for some I = 17( j Apg such that 
all minors of B are nonzero. * 

Proof. Let I = where 4>{n) denotes the Euler phi function, which 

is also the number of distinct primitive n’th roots of unity. It is known that 
lim inf„^oo = e~'^ ~ 0.56 (see Hardy and Wright |HW54j page 267, 

theorem 328), so I = as required. 

Let z be a variable and let C{z) be the I x I matrix with the zj’th entry being 
Cij = z'‘F Let B = C{oj) and note that B occurs as the I x I submatrix in the 
upper left corner of F. 

We show that all minors of B are nonzero. Clearly, each minor of C{z) is a 
polynomial in z with integer coefficients, and we will later show that no minor 
of C{z) is the zero-polynomial. Therefore, each minor in C(z) is a nonzero poly- 
nomial of degree strictly less than l^ < (f{n) (assuming that I > 2). This implies 
that the minors oi B = C{uj) are nonzero. To see this, observe that w is a root 
of the nth cyclotomic polynomial which has degree 4 >{n) and is irreducible over 
the field Q (see Hungerford m n page 299, Proposition 8.3). Therefore w is 

not root of any polynomial with integer coefficients and of degree strictly smaller 
than 4 >{n), as k has characteristic 0. 

We now show that no minor in the matrix C(z) is the zero-polynomial. Let 
an TO X TO minor D in C{z) be given by row-indices ii < • ■ ■ < im and column 
indices ji < • • ■ < jm- By Lemma 0 D = Hmjm _|_ p(^z), where p{z) is 

either the zero-polynomial or has degree strictly less than iiji -I- • • • -I- imjm- 
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Lemma 9. Let two sets of m positive integers each he given, namely I contain- 
ing i\ < ■ • • < im and J containing ji < • • • < jm- For any permutation a of 

{1, . . . ,to}, let So- = iija(i) H 1- imja(m)- Then Si > Sa for a I, where 1 

denotes the identity permutation. 

Proof. Let cr be a permutation on {1, . . . , such that a ^ 1. We will argue that 
by changing a slightly, we can get a new permutation t (possibly with r = 1) 
such that St > So-, which suffices to prove the lemma. 

Since cr 1, we can find a < b such that a{a) > cr{b). Define r to be 

identical to a except that r(a) = a{b) and r(6) = cr(a). This implies that Sr = 

S(T iaja(a) l^hja{b) “b 'lajr{a) “b 'lhjr{b) ~ Sq- iaja{a) l^bja{b) “b iaja{b) “b ibja{a) ~ 
S(j + ifb ^a)ija(a) ja(b)) P 
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Abstract. We show that it is NP-hard to approximate the traveling 
salesman problem with distances one and two within 5381/5380 — e, 
for any e > 0. Our proof is a reduction from systems of linear equations 
mod 2 with two unknowns in each equation and at most three occurrences 
of each variable. 



1 Introduction 

A common special case of the traveling salesman problem is the metric traveling 
salesman problem, where the distances between the cities satisfy the triangle in- 
equality. In this paper, we study a further specialization: The traveling salesman 
problem with distances one and two between the cities. This problem was shown 
to be NP-complete by Karp 0. Since this means that we have little hope of 
computing exact solutions, it is interesting to try to find an approximate solu- 
tion, i.e., a tour with weight close to the optimum weight. Christofides P] has 
constructed an elegant algorithm approximating the metric traveling salesman 
problem within 3/2. This algorithm also applies to the traveling salesman prob- 
lem with distances one and two, but it is possible to do better; Papadimitriou 
and Yannakakis [H| have shown that it is possible to approximate the latter 
problem within 7/6. They also show a lower bound; that there exists some con- 
stant, which is never given explicitly in the paper, such that it is NP-hard to 
approximate the problem within that constant. 

Recently, there has been a renewed interest in the hardness of approximating 
the traveling salesman problem with distances one and two. Fernandez de la 
Vega and Karpinski 0 and, independently, Fotakis and Spirakis Pj have shown 
that the hardness result of Papadimitriou and Yannakakis holds also for dense 
instances. We contribute to this line of research by showing an explicit lower 
bound on the approximability. More specifically, we construct a reduction from 
linear equations mod 2 with three occurrences of each variable to show that it 
is NP-hard to approximate the traveling salesman problem with distances one 
and two within 5381/5380 — e. 
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2 Preliminaries 

Definition 1. We denote by E2-Lin mod 2 the following maximization problem: 
Given a system of linear equations mod 2 with exactly two variables in each 
equation, maximize the number of satisfied equations. We denote by E2-Lin(3) 
mod 2 the special case of E2-Lin mod 2 where there are exactly three occurrences 
of each variable. 



Definition 2. We denote by (1,2)-TSP the traveling salesman problem where 
the distance matrix is symmetric and the off-diagonal entries are either one or 
two, and by A-TSP the traveling salesman problem where the distance matrix is 
symmetric and obeys the triangle inequality. 

We note in passing, that since (1,2)-TSP is a special case of A-TSP a lower bound 
on the approximability of (1,2)-TSP is also a lower bound on the approximability 
of A-TSP. 

To describe a (1,2)-TSP instance, it is enough to specify the edges of weight 
one. We do this by constructing a graph G, and then let the (1,2)-TSP instance 
have the nodes of G as cities. The distance between two cities u and v is defined 
to be one if (u, v) is an edge in G and two otherwise. To compute the weight of 
a tour, it is enough to study the parts of the tour traversing edges of G. 

Definition 3. We call a node where the tour leaves or enters G an endpoint. 
A city with the property that the tour both enters and leaves G in that particular 
city is called a double endpoint, and counts as two endpoints. 

If c is the number of cities and 2e is the total number of endpoints, the weight 
of the tour is c-l- e, since every edge of weight two corresponds to two endpoints. 
When we analyze our reduction, we study an arbitrary tour restricted to certain 
subgraphs of G. Generally, such a restriction consists of several disjoint paths. 
To shorten the notation, we call these paths partial tours. 

3 Our Construction 

To obtain our hardness result we reduce from E2-Lin(3) mod 2. Previous re- 
ductions from integer programming jOj and satisfiability |E| to (1,2)-TSP make 
heavy use of the so called xor gadget. This gadget is used both to link vari- 
able gadgets with equation gadgets and to obtain a consistent assignment to the 
variables in the original instance. The xor gadget contains twelve nodes, which 
means that a gadget containing some twenty xor gadgets for each variable — 
which is actually the case in the previously known reductions — produces a very 
poor lower bound. To obtain a reasonable inapproximability result, we modify 
the previously used xor gadget to construct an equation gadget. A specific node 
in the equation gadget corresponds to one occurrence of a variable. Since each 
variable occurs three times, there are three nodes corresponding to each variable. 
These nodes are linked together in a variable cluster. The idea behind this is that 
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Fig. 1. The equation gadget is connected to other gadgets through the vertices A 
and G and through the edges shown above from the vertices K and O. Gadget (a) 
corresponds to an equation of the form x + y = 1 and gadget (b) to an equation 
of the form x + y = Q. 



the extra edges in the cluster should force the nodes to represent the same value 
for all three occurrences of the variable. This construction contains 24 nodes for 
each variable, which is a vast improvement compared to earlier constructions. 

We give our construction in greater detail below. In a sequel of lemmas, we 
show that an optimal tour can be assumed to have a certain structure. We do this 
by showing that we can transform, by local transformations which do not increase 
the length of the tour, any tour into a tour with the sought structure. This new 
tour, obtained after the local transformations, can then be used to construct an 
assignment to the variables in the original E2-Lin(3) mod 2 instance. Our main 
result follows from a recent hardness result of Berman and Karpinski PP together 
with a correspondence between the length of the tour in the (1,2)-TSP instance 
and the number of unsatisfied equations in the E2-Lin(3) mod 2 instance. 



3.1 The Equation Gadget 

The equation gadget is shown in Fig. [D It is connected to other gadgets in four 
places. The vertices A and G actually coincide with similar vertices at other 
gadgets to form a long chain. Thus, these vertices actually have degree two. The 
edges from the vertices K and O join the equation gadget with other equation 
gadgets. We study this closely in Sec. 13. 21 No other vertex in the gadget is joined 
with vertices not belonging to the gadget. 

Definition 4. We from now on call the vertices K and O in Fig. Q the lambda- 
vertices of the gadget and the boundary edges connected to these vertices lambda- 
edges. For short, we often refer to the pair of lambda-edges linked to a particular 
lambda-vertex as a lambda. (This name was chosen since the lambda-edges look 
like the Greek letter A.) 



Definition 5. We say that a lambda is traversed if both lambda-edges are tra- 
versed by the tour, untraversed if none of the lambda-edges are traversed, and 
semitraversed otherwise. 

In the following lemmas, we study what an optimal tour can look like. 
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Fig. 2. Given that the lambda-edges are traversed as shown above, it is possible 
to construct a tour through the equation gadget such that there are no endpoints 
in the gadget. 



Lemma 1. Suppose that we have a tour traversing an equation gadget of the 
type shown in Fig. Ob in such a way that there are no semitraversed lambdas 
in it. If there is exactly one traversed lambda, it is possible to modify this tour, 
without increasing its length and without changing the tour on the lambdas, in 
such a way that there are no endpoints in the gadget. Otherwise, it is possible to 
construct a tour with two endpoints in the gadget and impossible to construct a 
tour with less than two endpoints in the gadget. 

Suppose that we have a tour traversing an equation gadget of the type shown 
in Fig.^fi in such a way that there are no semitraversed lambdas in it. If there are 
zero or two traversed lambdas, it is possible to modify this tour, without increasing 
its length and without changing the tour on the lambdas, in such a way that there 
are no endpoints in the gadget. Otherwise, it is possible to construct a tour with 
two endpoints in the gadget and impossible to construct a tour with less than two 
endpoints in the gadget. 



Proof. Figures ^and0 show that there exists tours with the number of endpoints 
stated in the lemma. To complete the proof, we must show that it is impossible 
to construct better tours in the cases where the tour has two endpoints in the 
gadget. It is locally optimal to let the tour traverse the edge AB and the edge FG 
in Fig. n Thus we can assume that one partial tour enters the gadget through 
the vertex A and that another, or possibly the same, partial tour enters through 
the vertex G. Since there are no semitraversed lambdas in the gadget, the only 
way, other than through the above described vertices, a partial tour can leave 
the gadget is through an endpoint, which in turn implies that there is an even 
number of endpoints in the gadget. 

If there is to be no endpoint in the gadget, all of the edges GHL, DIM and EJN 
must be traversed by the tour. Also, the edges GD and LM cannot be traversed 
simultaneously, neither can DE and MN. The only way we can avoid making the 
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Fig. 3. Given that the lambda-edges are traversed as shown above, there must 
be at least two endpoints in the gadget. There are in fact many tours with this 
property, we show only a few above. 



vertices D and M endpoints is to traverse either the edges CD and MN or the 
edges DE and LM. 

Let us suppose that the lambdas in the gadget are traversed as shown in 
Fig. Et- By our reasoning above and the symmetry of the gadget, we can assume 
that the edges AB, CHLMIDEJN, and EG are traversed by the tour. To avoid 
making the vertices C and N endpoints, the tour must traverse the edges BC 
and NO. But this is impossible, since the right lambda is already traversed. 
Thus, there is no tour with zero endpoints in the gadget, which implies that is 
impossible to construct a tour with less than two endpoints in the gadget. With 
a similar argument, we conclude that the same holds for the other cases shown 
in Fig. El 

Lemma 2. Suppose that we have a tour traversing an equation gadget in such 
a way that there is exactly one semitraversed lambda in it. Then it is possible to 
modify this tour, without increasing its length and without changing the tour on 
the lambdas, in such a way that there is one endpoint in the gadget, and it is 
impossible to construct a tour with less than one endpoint in the gadget. 

Proof. From Fig. 0 we see that we can always construct tours such that there 
is one endpoint in the gadget. We now show that it is impossible to construct a 
tour with fewer endpoints. As in the proof of Lemmas we can assume that one 
partial tour enters the gadget at A and that another, or the same, enters at G. 
Since there is one semitraversed lambda in the gadget, one partial tour enters 
the gadget at that lambda, which implies that there must be an odd number of 
endpoints in the gadget. 

Lemma 3. Suppose that we have a tour traversing an equation gadget in such a 
way that there are two semitraversed lambdas in it. Then it is possible to modify 
this tour, without increasing its length and without changing the tour on the 
lambdas, in such a way that there are two endpoints in the gadget, and it is 
impossible to construct a tour with less than two endpoints in the gadget. 
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Fig. 4. If one lambda is semi-traversed, there must be at least one endpoint in 
the gadget. 




Fig. 5. If both lambdas in a gadget are semi-traversed, there must be at least 
two endpoints in the gadget. 



Proof. From Fig. El we see that we can always construct tours such that there 
are two endpoints in the gadget. In order to prove the last part of the lemma we 
must argue that it is impossible to traverse the gadget in such a way that there 
are no endpoints in it. By an argument similar to that present in the proofs of 
Lemmas [DandEI a partial tour will enter (or leave) the gadget at four places, 
which implies that there is an even number of endpoints in the gadget. If there 
is to be no endpoints in the graph, there must be two partial tours in the gadget. 
Since the tours cannot cross each other and the gadget is planar we have two 
possible cases. 

The first case is that the partial tour entering the gadget at A leaves it at G 
(for a gadget of the type shown in Fig. HJi) or at O (for a gadget of the type 
shown in Fig. HJ)) and the partial tour entering at K leaves at O (for a gadget of 
the type shown in Fig. HJi) or at G (for a gadget of the type shown in Fig.EJ)). 
These two partial tours cannot, however, traverse any of the edges GHL, DIM 
and EJN without crossing or touching each other. As noted in the proof of 
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Lemma ^ all three abovementioned edges must be traversed for the gadget to 
contain no endpoints. Thus, we can rule this case out. 

The second case is that the partial tour entering the gadget at A leaves it 
at K and the partial tour entering at G leaves at O. Since these two partial 
tours cannot traverse all three edges CHL, DIM and EJN without crossing or 
touching each other, we conclude that at least two endpoints must occur within 
the equation gadget. 



Lemma 4. It is always possible to change a semitraversed lambda to either a 
traversed or an untraversed lambda without increasing the number of endpoints 
in the tour. 

Proof. First suppose that only one of the lambdas in the equation gadget is 
semitraversed. By Lemma |21 we can assume that the gadgets are traversed ac- 
cording to Fig. 3 Let us study the tour shown in Fig. Hi. By replacing it with 
the tour shown in Fig. we remove one endpoint from the equation gadget, 
but we may in that process introduce one endpoint somewhere else in the graph. 
In proof, let A be the left lambda- vertex in Fig. Hr and V be the vertex adjacent 
to A through the untraversed lambda edge. If v is an endpoint, we simply let the 
partial tour ending at v continue to A, thereby saving one endpoint. If v is not an 
endpoint, we have to reroute the tour at u to A. This introduces an endpoint at 
a neighbor of v, but that endpoint is set off against the endpoint removed from 
the equation gadget. To sum up, we have shown that it is possible to convert 
the tour in Fig. ^ to the one in Fig. Ek without increasing the total number 
of endpoints in the graph. In a similar way, we can convert the tour in Fig. 03 
to the one in Fig Eb, the tour in Fig. 0; to the one in Fig Eh, and the tour in 
Fig. 01 to the one in Fig0, respectively. 

Finally, suppose that both lambdas are semitraversed. By Lemma 0 we can 
assume that the gadgets are traversed according to Fig. 0 By the method de- 
scribed in the previous paragraph we can convert the tour in Fig. 0 to the one 
in Fig0, the tour in Fig.0 to the one in Fig0, the tour in Fig.0 to the one 
in Fig0, and the tour in Fig. 0 to the one in Fig0, respectively. 

3.2 The Variable Cluster 

The variable cluster is shown in Fig. 0 The vertices A and B coincide with similar 
vertices at other gadgets to form a long chain, as described in Sec. lO Suppose 
that the variable cluster corresponds to some variable x. Then the upper three 
vertices in the cluster are lambda- vertices in the equation gadgets corresponding 
to equations where x occurs. The remaining two vertices in the cluster are not 
joined with vertices outside the cluster. 

Lemma 5. Suppose that we have a tour traversing a cluster in such a way 
that there are some semitraversed lambdas in it. Then, it is possible to modify 
the tour, without making it longer, in such way that there are no semitraversed 
lambdas in the cluster. 
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Fig. 6. There is one lambda-vertex for each occurrence of each variable. The 
three lambda- vertices corresponding to one variable in the system of linear equa- 
tions are joined together in a variable cluster. The three uppermost vertices in 
the figure are the lambda- vertices. 

Proof. Suppose that there is one semitraversed lambda. If the semitraversed 
lambda is the middle lambda of the variable cluster it can, by Lemma 0 be 
transformed into either a traversed or an untraversed lambda. This moves the 
semitraversed lambda to the end of the cluster. By moving the endpoint in 
the variable cluster to the equation gadget corresponding to the semitraversed 
lambda, we can make the last semitraversed lambda traversed or untraversed 
without changing the number of endpoints. 

Suppose now that there are two semitraversed lambdas. By Lemma 0 they 
can be transformed into either a traversed or an untraversed lambda without 
changing the number of endpoints in the tour. This implies that we can transform 
the tour in such a way that there is only one semitraversed lambda without 
changing the number of endpoints in the tour. Then we can use the method 
from the above paragraph to transform that tour in such a way that there are 
no semitraversed lambdas. 

Finally, suppose that all three lambdas are semitraversed. By Lemma E] the 
tour can be transformed in such a way that the two outer lambdas in the variable 
cluster are either traversed or untraversed without changing the weight of the 
tour. If the center lambda is not semitraversed after the transformation, the 
proof is complete. Otherwise we can apply the first paragraph of this proof. 

3.3 The Entire (1,2)-TSP Instance 

To produce the (1,2)-TSP instance, all equation gadgets are linked together in 
series, followed by all variable clusters. The first equation gadget is also linked 
to the last variable cluster. The construction is shown schematically in Fig. 0 
The precise order of the individual gadgets in the chain is not important. 

Our aim in the analysis of our construction is to show that we can construct, 
from a tour containing e edges of weight two, an assignment to the variables 
such that at most e equations are not satisfied. When we combine this with 
Berman’s and Karpinski’s recent hardness results for E2-Lin(3) mod 2 P, we 
obtain Theorem |21 our main result. 

Theorem 1. Given a tour with 2e endpoints, we can construct an assignment 
leaving at most e equations unsatisfied. 

Proof. Given a tour, we can by Lemmas 00 construct a new tour, without 
increasing its length, such that for each variable cluster either all or no lambda- 
edges are traversed. Then we can construct an assignment as follows: If the 
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Fig. 7. All equation gadgets and all variable clusters are linked together in a 
circular chain as shown schematically above. The equation gadgets are at the 
top of the figure and the variable clusters at the bottom. The precise order of 
the gadgets is not important. For clarity, we have omitted nine vertices from 
each equation gadget. 



lambda-edges in a cluster are traversed by the tour, the corresponding vari- 
able is assigned the value one; otherwise it is assigned zero. By Lemmas this 
assignment has the property that there are two endpoints in the equation gad- 
gets corresponding to unsatisfied equations. Thus, the assignment leaves at most 
e equations unsatisfied if there are 2e endpoints. 



Theorem 2. It is NP-hard to decide whether an instance of the traveling sales- 
man problem with distances one and two with 5376n nodes has an optimum tour 
with length above (5381 — ei)n or below (5380 -I- C 2 )n. 



Corollary 1. It is for any e > 0 NP-hard to approximate the traveling salesman 
problem with distances one and two within 5381/5380— e. 

Proof (of Theorem[^. The result of Berman and Karpinski states that it is 
NP-hard to determine if an instance of E2-Lin(3) mod 2 with 336n equations 
has its optimum above (332 — 62)71 or below (331 -I- ei)n. If we construct from an 
instance of E2-Lin(3) mod 2 an instance of (1,2)-TSP as described above, the 
graph contains 48n nodes if the E2-Lin(3) mod 2 instance contains 2n variables 
and 3n equations. Thus, Theoremnand the above hardness result together imply 
that it is NP-hard to decide whether an instance of (1,2)-TSP with 5376n nodes 
has an optimum tour with length above (5381 — ei)n or below (5380 -I- £2)71. 

4 Concluding Remarks 

We have shown in this paper that it is for any e > 0 NP-hard to approximate 
(1,2)-TSP within 5381/5380 — e. Since the best known upper bound on the 
approximability is 7/6, there is certainly room for improvements. Our lower 
bound follows from a sequence of reductions, which makes it unlikely to be 
optimal. The sequence starts with E3-Lin mod 2, systems of linear equations 
mod 2 with exactly three variables in each equation. Then follows reductions 
to, in turn, E2-Lin mod 2, E2-Lin(3) mod 2, and (1,2)-TSP. Thus, our hardness 
result ultimately follows from Hastad’s optimal lower bound on E3-Lin mod 2 |Sj . 
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Obvious ways to improve the lower bound is to improve the reductions used in 
each step, in particular our construction in this paper and the construction of 
Berman and Karpinski It is probably harder to improve the lower bound on 
E2-Lin mod 2, since the gadgets used in the reduction from E3-Lin mod 2 to 
E2-Lin mod 2 are optimal, in the sense that better gadgets do not exists for that 
particular reduction m Even better would be to obtain a direct proof of a lower 
bound on (1,2)-TSP. It would also be interesting to study the approximability 
of A-TSP in general, and try to determine if A-TSP is harder to approximate 
than (1,2)-TSP. 
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Abstract. In parallel and distributed computing scheduling low level 
tasks on the available hardware is a fundamental problem. Traditionally, 
one has assumed that the set of tasks to be executed is known before- 
hand. Then the scheduling constraints are given by a precedence graph. 
Nodes represent the elementary tasks and edges the dependencies among 
tasks. This static approach is not appropriate in situations where the set 
of tasks is not known exactly in advance, for example, when different 
options how to continue a program may be granted. 

In this paper a new model for parallel and distributed programs, the 
dynamic process graph, will be introduced, which represents all possible 
executions of a program in a compact way. The size of this represen- 
tation is small - in many cases only logarithmically with respect to the 
size of any execution. An important feature of our model is that the 
encoded executions are directed acyclic graphs having a "regular” struc- 
ture that is typical of parallel programs. Dynamic process graphs embed 
constructors for parallel programs, synchronization mechanisms as well 
as conditional branches. With respect to such a compact representation 
we investigate the complexity of different aspects of the scheduling prob- 
lem: the question whether a legal schedule exists at all and how to find 
an optimal schedule. Our analysis takes into account communication de- 
lays between processors exchanging data. Precise characterization of the 
computational complexity of various variants of this compact scheduling 
problem will be given in this paper. The results range from easy, that is 
MLOQSVAC£-comp\ete, to very hard, namely A/'S ARTTAIS-complete. 



1 Introduction 

Scheduling tasks efficiently is crucial for fast executions of parallel and dis- 
tributed programs. An intensive study of this scheduling problem has led to the 
development of a number of algorithms that cover a wide spectrum of strate- 
gies: from fully static, where the compiler completely precomputes the schedule, 
i.e. when and where each task will be executed, to fully dynamic, where tasks 
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are scheduled at run-time only. Changing from static to dynamic strategies one 
gets the potential of reducing the total execution time of a program because the 
resources are better used, but in general there will be more effort necessary at 
run-time. Therefore, existing parallel systems fix most details of the schedule 
already at compile time (see e.g. 0). For a more dynamic and also fault tolerant 
approach see for example the MAFT project ijlfjl . 

In many cases, the set of tasks that have to be executed is not precisely 
known at compile time. El-Rewini and Ali have introduced a parallel program 
model that allows a suitable data representation for static scheduling algorithms 

The representation is based on two directed graphs: the branch graph and the 
precedence graph. This approach models conditional branchings quite well, but 
it is unsuited for parallel program constructors or synchronization mechanisms, 
for example the channel concept as implemented in the parallel programming 
language OCCAM. 

1.1 Extension to a Dynamic Environment 

In this paper we introduce a new model, the dynamic process graph, DPG, 
which allows a natural representation for parallel and distributed programs. In 
particular, it gives a concise description of static scheduling problems for highly 
concurrent programs. An important feature of DPGs is that they resemble the 
characteristics of executing typical parallel or distributed programs, and this in a 
space-efficient way. This representation can provide an exponential compaction 
compared to the length of execution sequences. 

Our main technical contribution is to analyze the complexity of finding op- 
timal schedules with respect to this compact program representation. We will 
concentrate on scheduling elementary tasks where each task has unit execution 
time. No bound will be put on the number of processors available. 

1.2 Communication Delays 

Papadimitriou and Yannakakis have argued in m that scheduling policies should 
take into account communication delays occurring when one processor sends a 
piece of data to another one. Thus, it will be faster to schedule dependent tasks 
on the same processor. For further results concerning scheduling with commu- 
nication delays in the standard static setting see mm- Here, we will extend 
the complexity analysis to the dynamic case. 

Communication delays will be specified by a function 6 : E —>■ IN, which 
defines the time necessary to send the data from one processor to another one. 
For simplification we assume that this delay is independent of the particular 
pair of processors (alternatively that (5(e) gives an upper bound on the maximal 
delay). Scheduling with communication delays requires the following condition 
to be fulfilled: if a task v is executed on processor p at time t then for each direct 
predecessor u of v holds: 

u has been finished either on p by time t — 1, or on some other processor p' 

by time t — 1 — S{u, v). 
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1.3 Dynamic Dependencies 

In many systems, dependencies among particular task-instances are determined 
by a scheduling policy, not by the program itself. Consider the following situa- 
tion. A program contains a part P where several processes concurrently generate 
data. The program can continue as soon as one of these results is available. Such 
a situation can compactly be described using the ALT-constructor of OCCAM 
PQ. In the piece of code given in the left part of Figure Q] one process sends data 
down channel Cl, while a second one behaves similarly using channel C2. 

P 

ALT 

Cl ? X 
PI 
C2 ? X 
P2 

Fig. 1. A dynamic precedence graph representing P; the output mode of P is 
ALT and the input mode of alternativei and alternative^ is PAR. 

A scheduling policy has to select one (and only one) of the alternatives Pi. 
Hence, either PI or P2 will be a successor of P. If both channels Ci get ready at 
the same time then a scheduler can choose arbitrarily. However, to minimize the 
total execution time it is helpful to know which process can be executed faster, 
PI or P2? Even if the two channels do not get ready simultaneously executing a 
ready alternative immediately, may overall lead to a longer schedule than waiting 
until the other channel is ready. 

We will consider different degrees of concurrency expressed by the ALT and 
the PAR constructor to create parallel processes, and analyze the complexity of 
the corresponding scheduling problems with respect to the amount of concur- 
rency. 

To represent such parallel programs in a natural way, we introduce dynamic 
process graphs, which are generalizations of standard precedence graphs. A dy- 
namic process graph is an acyclic graph G = (V, E) with two sets of labels 
l(v),0(v) € {PAR, ALT} attached to the nodes v gV. Nodes represent tasks, 
edges dependencies between tasks. A complete formal definition will be given in 
the next section. 

The label I{v) describes the input mode of task v. If I{v) = ALT then to 
execute v at least one of the predecessor tasks u with (m, v) G E has to be 
completed. I{v) = PAR requires that executions of all predecessors of v have to 
be completed before v can start. If task v has been completed then according 
to the output mode 0{v) one of v's successors in case 0(v) = ALT (resp. all of 
them in case 0{v) = PAR) has to be initiated. Fig. [H gives an example of such 
a representation. 

Dynamic process graphs are a compact way to illustrate data dependencies of 
parallel programs written e.g. in a parallel programming language like OCCAM 
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or Ada. Note that a standard precedence graph cannot represent such programs 
in a simple way. We should note that dynamic process graphs can also be modeled 
by a certain class of Petri nets and their reachability problem. However, this class 
does not seem to correspond to those subclasses that have been considered in 
more detail in the literature. We will therefore stick to the DPG model. 

1.4 New Results 

The scheduling problem for dynamic process graphs is a natural generalization 
of the static scheduling problem. In particular, the delay scheduling for a prece- 
dence graph G is equivalent to the scheduling problem for the dynamic process 
graph (G, /, O) with 1,0 = PAR. In the static case, the scheduling problem with 
communication delays is already computationally difficult. In m it has been 
shown that this problem is AfT^-complete even if for each graph the communica- 
tion delay takes only a single value, but this value has to increase with the size of 
the graph. We have improved this result in jS| showing that the problem remains 
AfP-complete even if we restrict the class of precedence graphs to (l,2)-trees. On 
the other hand, in P| it has been shown that for fixed delay S = c independent 
of the precedence graphs the problem can be solved in polynomial time where 
the degree of the polynomial grows with c. 

Due to the compact representation in our dynamic model it is no longer 
obvious that the dynamic scheduling problem can be solved in MV at all. In 
fact, one of our main results is that even restricted to constant communication 
delay the scheduling problem for dynamic process graphs is AfEXVTTAiE- 
complete. To prove this we construct a reduction of the SUCCINCT-3SAT 
problem. However, if we restrict the input mode I to ALT then the problem 
becomes P-complete. Even more, also fixing the output mode the problem be- 
comes N CO QSV ACS -complete. A similar complexity jump has been observed 
for classical graph problems in nmnu. There it is shown that simple graph 
properties become A/”7^-complete when the graph is represented in a particular 
succinct way using generating circuits or a hierarchical decomposition. Under 
the same representation graph properties that are ordinarily AfT^-complete, like 
HAMILTON CYCLE, 3-COLORABILITY, CLIQUE (of size |U|/2), etc., be- 
come JVSXVTJMS-complete. 

On the other hand, some restricted variants of this scheduling problem, which 
may seem to be easy at first glance, remain hard, namely A/”7^-complete. Fig. El 
summarizes our results about the complexity of the dynamic delay scheduling 
problem with respect to the input and output modes that may occur in the 
graphs. 

We will also consider the question whether for a given dynamic process graph 
there exists a schedule for a program represented by the graph at all. It will be 
shown that the problem is AfP-complete even if the input mode is restricted to 
PAR and the output mode to ALT. 

The remaining part of this paper is organized as follows. In Section 2 we give 
some examples and a formal definition of DPCs and the scheduling problem. 
Section 3 studies the complexity of the existence problem. The last section deals 



Scheduling Dynamic Graphs 387 



input mode 


output mode 


complexity 


ALT 


ALT 


AfjCOQSVACE-complete 


ALT 


PAR 


J^COQSV AC£-complete 


ALT 


ALT, PAR 


"P-complete 


PAR 


ALT, PAR or PAR or ALT 


AfP-complete 


ALT, PAR 


ALT 


A/’T’-complete 


ALT, PAR 


PAR 


t37t2-hard 


ALT, PAR 


ALT, PAR 


AfSXVTlMS-complete 



Fig. 2. The complexity of dynamic scheduling with communication delays with 
respect to input and output modes. 

with the problem to find optimal schedules. Due to space limitations we will 
only present the technically most difficult result, which is scheduling unrestricted 
DPGs, and give a short sketch of the construction for the lower bound. 

2 Dynamic Process Graphs and Rnns 

For illustration, consider the following parallel program P written in OCCAM 
(Fig.0). This program contains branches, but the situation is not as bad as it 
could be since the branching does not depend on the current values of variables. 
Still there is the problem to determine the set of tasks that have to be executed 
at run time. It will turn out that even for such restricted programs the scheduling 
problem is quite hard. Depending on the chosen ALT branches the execution of 
this program is represented by one of the four possible runs shown in Fig. ^ The 
following definition tries to capture this dichotomy between parallel/distributed 
programs and their executions. 

Definition 1. A dynamic process graph Q = (G,I,0) is a directed acyclic 
graph (DAG) G = (V,E), with node labellings 1,0 \ V ^ {ALT, PAR}. V = 
{v\,V 2 , ■ ■ ■ ,Vn} represents a set o/ processes and E dependencies among them. 
I and O describe input modes, (resp. output modes} of the processes. 

A finite DAG Hg = (W, F) is a run ofQ iff the following conditions are fulfilled: 

1. The set W is partitioned into subsets W{v\) U . . . U W(vn) . The nodes in 
W{vi) are execution instances of the process Vi and will be called tasks. 

2. Each source node of G has exactly one execution instance in Efg . 

3. For a process v G V let pred{v) := {u\,U 2 , . . . ,Up\ denote the set of all 
predecessors of v and succ(v) := {wi,W 2 , . . . ,Wr} its successors. For any 
execution instance x of v in W (v) it has to hold 

• if I {v) = ALT then x has a unique predecessor y belonging to W(ui) for 
some i G {1, . . . ,p|; 

• if I{v) = PAR then pred{x) = |?/i, ?/ 2 , • ■ ■ , 2/p} with yi G W(ui) for each 
i G {!,..., p}; 
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SEQ 

ALT 

ini ? X 
PI 

in2 ? X 
P2 

PAR 

P3 

P4 

ALT 

ini ? y 
P5 

in2 ? y 
P6 





P4 





P4 



Fig. 3. OCCAM program P. Fig. 4. Possible runs of P. 



• if 0{v) = ALT then x has a unique successor z belonging to W{wj) for 
some j e {1 , . . . ,r}; 

• if 0(v) = PAR then succ(x) = {zi, Z 2 , ■ ■ ■ , Zr} with Zj G W{wj) for each 

j G r}. 

Fig.EI shows a node u of a DPG with input mode Qi and output mode Q 2 - 
Through the paper we will illustrate the ALT-mode by a white box, the PAR- 
mode by a black box. For a source or a node with indegree 1 the input mode is 
obviously inessential. Hence we will ignore such a label, and similarly the output 
label in case of a sink or a node with outdegree 1. A dynamic process graph 
corresponding to the program of Fig. Elis shown in Fig. El (a). 

Observe that a run can be smaller than its defining dynamic process graph, 
e.g. the graph in Fig. El (a) has 10 nodes while its run (b) only 8. Hence, cer- 
tain tasks are not executed at all. More typically, however, a run will be larger 
than the dynamic process graph itself since the PAR-constructor allows process 
duplications. If, for example, the output mode of vertex v\ in Fig. Elis changed 
from ALT into PAR then both processes Pi and P 2 have to be executed. Hence, 
process V 2 with input mode ALT has to be duplicated in order to consume both 
processes. This in turn implies that P3 and P4 will have 2 execution instances 
each. Therefore, each run of the modified graph consists of 15 nodes. The follow- 
ing lemma gives an upper bound on this blow-up, resp. the possible compaction 
ratio of dynamic process graphs. 

Lemma 1. Let Q = ((V, A), /, O) be a dynamic process graph and Hg = (VF, F) 
be a corresponding run. Then it holds \W\ < Moreover, this general 

bound is tight. 

Hence, there are dynamic process graphs where processes have exponential 
many execution instances. Note that a similar effect occurs by using the repli- 
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Qi =ALT 
Q2 =ALT 




Qi =ALT 
Q2 =PAR 





Fig. 5. A node V with input label 
Qi and output label Q 2 and the 
schematic representation. 




Fig. 6. (a) a dynamic process graph for 
program P, (b) a run of this graph. 



cated PAR- and ALT-constructor of OCCAM, which allows an exponential blow 
up of the number of active tasks. 

Lemma 2. Any dynamic process graph has at most double exponential many 
different runs and this bound ean actually occur. 

Definition 2. Let Q = {G,I,0) be a dynamic process graph with communiea- 
tion delay S : E ^ IN between its processes. A schedule for Q,S is a schedule of a 
run H = {W, F), where the communication delay between eaeh pair of execution 
instance x G W{u) and y G W{v) is given by 5{u,v). 

If S is a schedule of G,S then let T(S) denote the duration of S, i.e. the 
amount of time necessary to complete all tasks in S . Define 
Topt{G,S) := mins for g,sT{S). 

This leads to the following decision problem: 

Definition 3. DPG-SCHEDULING (DPGS): Given a DPG {G,I,0), a 
communication delay 6, and a deadline T* , does Topt{G,S) < T* hold? 

3 The Execution Problem 

The number of different runs of a DPG can be huge according to LemmaO On 
the other hand, it is not obvious that for any DPG an appropriate run exists at 
all. It is easy to see that dynamic process graphs with either only PAR labels, 
or with all input labels equal to ALT can always be executed. The first case 
corresponds to standard static precedence graphs. However, this is no longer 
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true for arbitrary DPGs. For a simple example of a graph which has no run see 
Fig. 0 In this case the input mode of all nodes is PAR and their output mode 
ALT. This section studies the problem whether a given dynamic process graph 
has a legal run. 

Definition 4. Execution Problem for DPGs (ExDPG): Given a DPG Q , 
decide whether it can he executed, that means whether it has a run. 

As we have seen above for some restricted DPGs this question has a trivial 
answer. In general, a decision procedure may be complex. Of course, if a DPG 
has a run then it has also a schedule, and since we have estimated a bound on the 
size of the run, one can also compute an upper bound on the maximal schedule 
length given the maximal communication delay. Thus, the execution problem 
for DPGs could be solved by a reduction to the scheduling problem with a huge 
enough deadline. However, we want to capture the complexity of the execution 
problem more precisely and thus will investigate it directly. 

The main negative result of this section says that for graphs with arbitrary 
input and output modes the execution problem is AfP-complete. We will prove 
even more, namely the problem remains AfP-complete even if the input mode 
of all nodes is PAR while the output mode is ALT. On the positive side, the 
complexity decreases drastically for DPGs with output labels restricted to PAR. 
Fig. 0 summarizes our results about the complexity of ExDPG Problem with 
respect to input and output modes that may occur in the graphs. 




b) 



input mode 
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complexity 
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arbitrary 


trivial 


PAR 


PAR 


trivial 


ALT, PAR 


PAR 




PAR 


ALT 


A/’P-complete 


ALT, PAR 


ALT 


AfP-complete 


ALT, PAR 


ALT, PAR 


A/’P-complete 



Fig. 7. Example of a dynamic pro- 
cess graph with O = ALT and I = 
PAR that has no run. 



Fig. 8. The complexity of ExDPG- 
Problem with respect to input and out- 
put modes. 




4 Scheduling Dynamic Process Graphs 

Let us now consider the problem to construct optimal schedules for dynamic 
process graphs. Since the execution problem is already hard in the less restricted 
cases one has to expect similar negative results for the scheduling problem. Our 
main result below, however, implies that the compaction provided by dynamic 
process graphs is quite efficient. The complexity of scheduling general DPGs 
increases to Af£AlPTXA4£-complete. The hardness proof will be the topic of 
this section. We have also analysed the complexity for restricted classes of DPGs. 
They will be mentioned at the end of this section. 
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Theorem 1. Scheduling dynamic process graphs is AfSXVTTMS -complete, 
even for constant communication delay. 

Sketch of the proof. For the reduction we will use the following problem 

Definition 5. SUCCINCT-3SAT: As input we are given a Boolean circuit 
over the standard AND, OR, NOT-basis that succinctly codes a Boolean formula 
in conjunctive normal form with the additional property that each clause has 
exactly three literals and each literal appears exactly three times. Suppose that 
the encoded formula consists of n variables and m clauses. On input (0,i,k) with 
i G — 1} and k G {1,2,3} (appropriately eoded in binary), the eoding 

circuit returns the index of the clause where the literal ^Xi appears the k-th time. 
On input (1, i, k) it returns the index of the clause where Xi appears for the k-th 
time. On input (2, j, k) with j G (0, . . . , to — 1} and k G (1, 2, 3}, it returns the 
k-th literal of the j-th clause. 

The problem is to decide whether the encoded formula is satisfiable. 

The AffTT^TIAdf-completeness for this problem has been proved by Papadim- 
itriou and Yannakakis in HH. 

For the reduction of SUCCINCT-3SAT to the scheduling problem, the first 
crucial step is a transformation of a Boolean circuit B into a DPG Q. Our 
encoding achieves the following property. Let x\,X 2 , ■ ■ . ,Xr be the input gates 
of B, and vi,V 2 , ■ ■ . ,Vg be its output gates. In the coding graph Q there will be 
nodes Xfi, xu for 1 < f < r, and nodes Vfj,vtj for 1 < j < s. The meaning of the 
indices is as follows: Xfi codes the value false for Xi, while xu true; similarly Vfj 
and vtj code values false and true for Vj. To simplify notation, let Xi(false) := Xfi 
and Xi{true) := xu, etc. Then the following holds: for any input ( 6 i, 62 , ... , br), 
B returns (ci, C 2 , . . . , Cg) iff there exists a run for Q such that each of the process 
nodes a;i( 6 i), . . . , Xr{br) and ui(ci), . . . , Vs{cs) has one execution instance, and 
none of the complementary nodes Xi{^bi), . . . , Xr{~^br), ui(-'Ci), . . . , Vs{~^Cs) has 
any execution instance. 

The total construction consists of two disjoint graphs made up of the encoding 
of the input circuit B and some specific auxiliary modules. The first graph will 
be responsible for checking whether the circuit B generates a Boolean formula 
according to the syntax of SUCCINCT-3SAT. The second graph checks whether 
the encoded formula is satisfiable. The constructions are technically involved 
and a complete description of them as well as MSXVTXAAS algorithm for the 
scheduling problem can be found in |S|. □ 

If one restricts the DPGS problem to specific combinations of input or output 
modes then in most cases the computational complexity decreases significantly. 
In Figure El we list the results of our investigations. The proofs of these hardness 
results and the algorithms to establish the upper bounds can be found in jS| . 

5 Conclusion 

We have defined a dynamic model for scheduling process graphs. These dynamic 
process graphs allow a compact representation of typical distributed programs 



392 Andreas Jakoby, Maciej Liskiewicz, and Rudiger Reischuk 



written in OCCAM or Ada style. We are not aware of another framework with 
a similar expressive power. Restricting the input and output mode of nodes 
different degrees of concurrency can be modelled. 

With respect to the degree of concurrency we have analysed how difficult it is 
to decide whether a dynamic process graph can be executed and to construct op- 
timal schedules. An almost complete exact characterization could be given. Only 
for the scheduling problem with output mode restricted to PAR there remains a 
gap. 

For the general case, we have shown an exponential complexity jump when 
scheduling process graphs in the dynamic setting. This implies that our compact 
dynamic representation is quite effective. 

Finally, note that although constructing optimal schedules of standard graphs 
cannot be done efficiently, at least the problem can be approximated to a certain 
factor by simple algorithmic methods. This even holds when one takes communi- 
cation delays into account. We do not have any nontrivial approximation results 
for scheduling dynamic process graphs. 
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Abstract. Counting networks are a class of distributed data structures 
that support highly concurrent implementations of shared Fetch&Incre- 
ment connters. Applications of these counters inclnde shared pools and 
stacks, load balancing, and software barriers 1121 1131 IT^ . A limita- 
tion of counting networks is that the resnlting shared counters can be 
incremented, bnt not decremented. 

A recent result by Shavit and Touitou m showed that the subclass of 
tree-shaped counting networks can support, in addition, decrement op- 
erations. This paper generalizes their result, showing that any counting 
network can be extended to support atomic decrements in a simple and 
natnral way. Moreover, it is shown that decrement operations can be sup- 
ported in networks that provide weaker properties, snch as K -smoothing. 

In general, we identify a broad class of properties, which we call bound- 
edness properties, that are preserved by the introduction of decrements: 
if a balancing network satisfies a particular boundedness property for 
increments alone, then it continues to satisfy that property for both in- 
crements and decrements. 

Our proofs are purely combinatorial and rely on the novel concept of a 
fooling pair of input vectors. 
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1 Introduction 



Counting networks were originally introduced by Aspnes, Herlihy, and Shavit ^ 
and subsequently extended Dcniiii] They support highly concurrent imple- 
mentations of shared Fetch& Increment counters, shared pools and stacks, load 
balancing modules, and software barriers HCiiTiiTg. 

Counting networks are constructed from basic elements called balancers. A 
balancer can be thought of as a routing switch for elements called tokens. It has a 
collection of input wires and a collection of output wires, respectively called the 
balancer’s fan-in and fan-out. Tokens arrive asynchronously on arbitrary input 
wires, and are routed to successive output wires in a “round-robin” fashion. 
If one thinks of a balancer as having a state ‘toggle” variable tracking which 
output wire the next token should exit on, then a token traversal amounts to 
a Fetch& Toggle operation, retrieving the value of the output wire and changing 
the toggle state to point to the next wire. The distribution of tokens on the 
output wires of a balancer thus satisfies the step property if yi tokens exit 
on output wire i, then 0 < yt — yj < 1 for any j > i. 

A balancing network is a network of balancers, constructed by connecting 
balancers’ output wires with other balancers’ input wires in an acyclic fashion, 
in a way similar to the way comparator networks are constructed from compara- 
tors jOl Chapter 28]. The network itself has a number of input and output wires. 
A token enters the network on an input wire, traverses a sequence of balancers, 
and exits on an output wire. A balancing network is a, K -smoothing network Pi 
if, when all tokens have exited the network, the difference between the maximum 
and minimum number of tokens that exit on any output wire is bounded by K, 
regardless of the distribution of input tokens. Smoothing networks can be used 
for distributed load balancing. 

A 1-smoothing network is a counting network if it satisfies the same step 
property as a balancer: when all tokens have traversed the network, if yi tokens 
exit on output wire i, then 0 < j/i — < 1 for any j > i. Counting networks can 

be used to implement Fetch& Increment counters: the l-th token to exit on the 
j-th output wire returns the value j -\- (I — l)wout> where Wout is the network’s 
fan-out. 

A limitation of counting networks is that they support increments but not 
decrements. Many synchronization algorithms and tools require the ability to 
decrement shared objects. 

Shavit and Touitou m devised the first counting network algorithm to sup- 
port decrements for the class of networks that have the layout of a binary tree. 
They did so by introducing a new type of token for the decrement operation, 
which they named the antitoken^ Unlike a token, which traverses a balancer 
by fetching the toggle value and then advancing it, an antitoken sets the toggle 
back and then fetches it. Informally, an antitoken “cancels” the effect of the 
most recent token on the balancer’s toggle state, and vice versa. They provide 



^ The name was actually suggested by Yehuda Afek (personal communication). 
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an operational proof that counting trees uni count correctly when traversed by 
tokens and antitokens. 

Shavit and Touitou HI also introduced the notion of elimination. One can 
use a balancing network to implement a pool, a kind of concurrent stack. If a 
token representing an enqueue operation meets a token representing a dequeue 
operation in the network, then they can “cancel” one another immediately, with- 
out traversing the rest of the network. 

It is natural to ask whether the same properties hold for arbitrary counting 
networks. More generally, what properties of balancing networks are preserved 
by the introduction of antitokens? In this paper, we give the first general answer 
to this question. We show the following results. 

— If a balancing network is a counting network for tokens, then it is also a 
counting network for both tokens and antitokens. As a result any counting 
network can be extended to support a Fetch& Decrement operation. 

— Any counting network, not just elimination trees, permits tokens and anti- 
tokens to eliminate one another. 

— If a balancing network is a AT-smoothing network when inputs are tokens, 
then it remains a if-smoothing network when inputs include both tokens 
and antitokens. 

— We identify a broad class of properties, which we call boundedness properties, 
that are preserved by the introduction of antitokens: if a balancing network 
satisfies a particular boundedness property when inputs are tokens, then 
it continues to satisfy that property when inputs include both tokens and 
antitokens. The step property and the A'-smoothing property are examples 
of boundedness properties. 

Unlike earlier work HHI, our proofs are combinatorial, not operational. They 
rely on the novel concept of a fooling pair of input vectors, which, we believe, is 
of independent interest. 

We assign the value 1 to each token and -1 to each antitoken. We treat a 
balancer as an operator carrying an integer input vector to an integer output 
vector. The Ath entry in the input vector represents the algebraic sum of the 
tokens and antitokens received on the i-ih. input wire, and similarly for the 
output vector. For example, if this value is zero, then the same number of tokens 
and antitokens have arrived on that wire. We treat a balancing network in the 
same way, as an “operator” on integer vectors. 

A boundedness property is a set of possible output vectors satisfying 

— it is a subset of the AT-smoothing property, for some AT > 1, and 

— it is closed under the addition of any constant vector. 

Both the AT-smoothing and the step property are examples of boundedness prop- 
erties. Our principal result is that any balancing network that satisfies a bound- 
edness property for non-negative integer input vectors also satisfies that property 
for arbitrary integer input vectors. 

The state of a balancer is the “position” of its toggle. Two input vectors 
form a fooling pair to a balancer if, starting in the same state, each “drives” 
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the balancer to the same state. Similarly, a balancing network state is given by 
its balancers’ states. Two input vectors form a fooling pair for that network if, 
starting from the same state, each drives the network to the same state. For a 
specific initial state of a balancing network, its fooling pairs define equivalence 
classes of input vectors. 

Roughly speaking, we prove our main equivalence result as follows. Consider 
any balancing network with some boundedness property; take any arbitrary in- 
teger input vector and the corresponding integer output vector. By adding to 
the input vector an appropriate vector that belongs to the equivalence class for 
some given initial state, we obtain a new input vector such that all of its en- 
tries are non-negative integers. We show that the output vector corresponding 
to the new input vector is, in fact, equal to the original output vector plus a 
constant vector. Hence, our main equivalence result follows from closure of the 
boundedness property under addition with a constant vector. 

2 Framework 

For any integer g > 2, denotes the vector {xq,xi, . . . while 

denotes the integer vector ([xq] , \x{\ , . . . , |’xg_i])"'". For any vector denote 
llxlli = E?=o Xi- We use 0*-®^ to denote (0, 0, ... , 0)"^, a vector with g zero entries; 
similarly, we use to denote (1,1,...,!)"'", a vector with g unit entries. We 
use to denote the ramp vector (0, 1, ... ,5 — 1)"^. A constant vector is any 
vector of the form for any constant c. 

Balancing networks are constructed from acyclically wired elements, called 
balancers, that route tokens and antitokens through the network, and wires. For 
generality, balancers may have arbitrary fan-in and fan-out, and they handle 
both tokens and antitokens. 

For any pair of positive integers /in and /out, an {f-m, font) -balancer, or bal- 
ancer for short, is a routing element receiving tokens and antitokens on /,„ input 
wires, numbered 0, 1, ...,/,„ — 1, and sending out tokens and antitokens to /out 
output wires, numbered 0,1 ,..., /out — 1; /in and /out are called the balancer’s 
fan-in and fan-out, respectively. Tokens and antitokens arrive on the balancer’s 
input wires at arbitrary times, and they are output on its output wires. Roughly 
speaking, a balancer acts like a “generalized” toggle, which, on a stream of input 
tokens and antitokens, alternately forwards them to its output wires, going ei- 
ther down or up on each input token and antitoken, respectively. For clarity, we 
assume that all tokens and antitokens are distinct. Figure [D depicts a balancer 
with three input wires and five output wires, stretched horizontally; the balancer 
is stretched vertically. In the left part, tokens and antitokens are denoted with 
full and empty circles, respectively; the numbering reflects the real-time order 
of tokens and antitokens in an execution where they traverse the balancer one 
by one (called a sequential execution). 

For each input index / 0 < i < /in — 1, we denote by Xi the balancer input 
state variable that stands for the algebraic sum of the numbers of tokens and 
antitokens that have entered on input wire / that is, Xi is the number of tokens 
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Fig. 1. A balancer 



that have entered on input wire i minus the number of antitokens that have 
entered on input wire i. Denote = {xq,X\, . . . , call an input 

vector. For each output index j, 0 < j < /out — 1, we denote by yj the balancer 
output state variable that stands for the algebraic sum of the numbers of tokens 
and antitokens that have exited on output wire j; that is, yj is the number of 
tokens that have exited on output wire j minus the number of antitokens that 
have exited on output wire j. The right part of Fig. 0] shows the corresponding 
input and output state variables. Denote = (yo) 2/i) • ■ • > y/out-i)"*"; call 

y(/out) output vector. 

The configuration of a balancer at any given time is the tuple 
roughly speaking, the configuration is the collection of its input and output state 
variables. In the initial configuration, all input and output wires are empty; 
that is, in the initial configuration, and y(Aut) _ Q(/out)^ ^ 

configuration of a balancer is quiescent if there are no tokens or antitokens in 
the balancer. Note that the initial configuration is a quiescent one. The following 
formal properties are required for an (/i„, /out)-balancer. 

1 . Safety property: in any configuration, a balancer never creates either tokens 
or antitokens spontaneously. 

2. Liveness property: for any finite number t of tokens and a of antitokens that 
enter the balancer, the balancer reaches within a finite amount of time a 
quiescent configuration where t — e tokens and a — e antitokens have exited 
the network, where e, 0 < e < min{t, a}, is the number of tokens and 
antitokens that are “eliminated” in the balancer. 

3. Step property: in any quiescent configuration, for any pair of output indices 

j and k such that 0 < j < k < /out — 1, 0 < < 1. 

From the safety and liveness properties it follows, for any quiescent config- 
uration Qf balancer, that ||x’^-^‘“)||i = |iy*--^°''*^||i; that is, in a 

quiescent configuration, the algebraic sum of tokens and antitokens that exited 
the balancer is equal to the algebraic sum of tokens and antitokens that en- 
tered it. The equality of sums holds also for the case where some the tokens and 
antitokens are “eliminated” in the balancer. 
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For any input vector denote the output vector in the 

quiescent configuration that b will reach after all ||x*^hn)||^ tokens and antitokens 
that entered b have exited; write also b : x^^™) ^ y(/out) to denote the balancer 
b. The output vector can also be written PEIIHI as 

||-j^(/in) II ^ ^(/out) — j.(/out) 
font 

For any quiescent configuration (x^hn)^ y(/out)^ of a balancer b : x^^”) — *■ 
y(/out)^ the state of the balancer b, denoted state{,((x(-^‘”\ y(-^°“*))), is defined to 
be 

state6((x(^‘”\y(-^°“*))) = ||x(^‘”)||i mod /out; 
since the configuration is quiescent, it follows that 

stateb((x^^“\y^^°“‘^)) = ||y*^-^°“*^||i mod /out ■ 

Thus, for the sake of simplicity, we will denote 

state{,(x^-^‘“^) = statef,((x^^‘"\ y^^°“‘^)) . 

We remark that the state of an (/i„, /out)-balancer is some integer in the 
set {0, 1, . . . , /out — 1}, which captures the “position” to which it is set as a 
toggle mechanism. This integer is determined by either the balancer input state 
variables or the balancer output state variables in the quiescent configuration. 
Note that the state of the balancer in the initial configuration is 0. Moreover, the 
linearity of the modulus operation immediately implies linearity for the balancer 
state. 

Lemma 1. Consider a balancer b : x*^hn) y(/out)^ Then, for any input vectors 
and X 2 '^‘“\ 

stateb{'x.[^'"'^ + X 2 '^‘°^) = (stateb(xj^‘“^) + stateb(x 2 ^‘“^)) mod /out ■ 

A {wiyi,w out) -balancing network K is a collection of interwired balancers, 
where output wires are connected to input wires, having Wm designated input 
wires, numbered 0, 1, . . . , Win ~ !> which are not connected to output wires of 
balancers, having Wout designated output wires, numbered 0,l,...,Wout — !> 
similarly not connected to input wires of balancers, and containing no cycles. 
Tokens and antitokens arrive on the network’s input wires at arbitrary times, 
and they traverse a sequence of balancers in the network in a completely asyn- 
chronous way till they exit on the output wires of the network. 

For each input index / 0 < i < Win — 1, we denote by Xi the network input 
state variable that stands for the algebraic sum of the numbers of tokens and 
antitokens that have entered on input wire i] that is, Xi is the difference of the 
number of tokens that have entered on input wire i minus the number of anti- 
tokens that have entered on input wire i. Denote = {xq, xi, . . . , Xwi„-i)'^', 
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call an input vector. For each output index j, 0 < j < Wout — we denote 

by Uj the network output state variable that stands for the algebraic sum of the 
numbers of tokens and antitokens that have exited on output wire j; that is, i/j is 
the number of tokens that have exited on output wire j minus the number of anti- 
tokens that have exited on output wire i. Denote = (?/q, j/i, . . . , 

call an output vector. 

The configuration of a network at any given time is the tuple of configurations 
of its individual balancers. In the initial configuration, all input and output 
wires of balancers are empty. The safety and liveness property for a balancing 
network follow naturally from those of its balancers. Thus, a balancing network 
eventually reaches a quiescent configuration in which all tokens and antitokens 
that entered the network have exited. In any quiescent configuration of B we 
have = ||y^’"°"*^ || i; that is, in a quiescent configuration, the algebraic 

sum of tokens and antitokens that exited the network is equal to the algebraic 
sum of tokens and antitokens that entered it. 

Naturally, we are interested in quiescent configurations of a network. For any 
quiescent configuration of a network B with corresponding input and output 
vectors and y(“’°“‘)^ respectively, the state of B, denoted stateB(x(“‘“)), 

is defined to be the collection of the states of its individual balancers. We re- 
mark that we have specified as the single argument of stateg, since x^™”) 

uniquely determines all input and output vectors of balancers of B, which are 
used for defining the states of the individual balancers. Note that the state of 
the network in its initial configuration is a collection of O’s. For any input vector 
x(’"in)^ denote the output vector in the quiescent configura- 

tion that B will reach after all ||x(™‘")|ji tokens and antitokens that entered B 
have exited; write also B : ^ yl^out) denote the network B. 

Not all balancing networks satisfy the step property. A (win,Wout)~ counting 
network is a (win, Wout)-balancing network for which, in any quiescent config- 
uration, for any pair of indices j and k such that 0 < j < A: < Wout — 1> 
0 < j/j — < 1; that is, the output of a counting network has the step property. 

The definition of a counting network can be weakened as follows P], Ej . For 
any integer AT > 1, a {win,Wout)-K -smoothing network is a (wm, Wout)-balancing 
network for which, in any quiescent configuration, for any pair of indices j and 
k such that 0 < j,k < Wout — 1, 0 < \yj — yk\ < AT; that is, the output vector 
of a if-smoothing network has the K -smoothing property, all outputs are within 
K to each other. 

For a balancing network B, the depth of B, denoted depth(,B), is defined to 
be the maximum distance from any of its input wires to any of its output wires. 
In case depth(;B) = 1, B will be called a layer. If depth(;B) = d is greater than 
one, then B can be uniquely partitioned into layers B\, B 2 , ■ ■ ■ , Bd from left to 
right in the obvious way. 

Fix any integer g >2. For any integer K > 1, the K -smoothing property [Q 
is defined to be the set of all vectors such that for any entries yj and yk of 
where 0 < j, k < g — 1, \yj — yfe| < Jf. A boundedness property is any subset 
of some AT-smoothing property, for any integer A' > 1, that is closed under 
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addition with a constant vector. Clearly, the AT-smoothing property is trivially 
a boundedness property; moreover, the set of all vectors that have the step 
property 0 is a boundedness property, since any step vector is 1-smooth (but not 
vice versa). We remark that there are infinitely many boundedness properties. 

A boundedness property captures precisely the two properties possessed by 
both if-smooth and step vectors upon which our later proofs will rely. Although 
we are unaware of any interesting property, other than the AT-smoothing and 
step, that is a boundedness one, we chose to state our results for any general 
boundedness property in order to make explicit the two critical properties that 
are common to the classes of if-smooth vectors and step vectors; moreover, 
arguing in terms of a boundedness property will allow for a single proof of all 
claims found to hold for both the AT-smoothing property and the step property. 

Say that a vector y has the boundedness property II if y G II. Say that a 
balancing network B : — > yl*"™*) has the boundedness property II if for 

every input vector K(x^’"‘"^) € II. 



3 Results 

Input vectors and are a, fooling pair to balancer b : —>■ y(/°u‘) if 

statef,(xj-^‘“^) = statef,(x2^‘"^) ; 

roughly speaking, inputs in a fooling pair drive the balancer to identical states. 

Proposition 1. Consider a balancer b : y(/out)^ Take any input vectors 

x) and X 2 that are a fooling pair to balancer b. Then, for any input vector 

x(/in)^ 

(1) the input vectors and X 2 ^‘“^-|-x*^^“) are a fooling pair to balancer 

b; 

(2) -k -k x(/‘")) - 

Input vectors and are a fooling pair to network B : > 

y(ujout) jf fQj. each balancer b of B, the input vectors of b in quiescent configu- 
rations corresponding to and respectively, are a fooling pair to 5; 

roughly speaking, a fooling pair “drives” all balancers of the network to identical 
states in the two corresponding quiescent configurations. 



Proposition 2. Consider a balancing network B : ^ yli^out)^ Take any 

input vectors and 

any input vector 



input vectors and that are a fooling pair to network B. Then, for 



(1) the input vectors and are a fooling pair to 

network B; 

(2) + x(“‘”)) - -k x(“‘”)) - S(x^™‘"^). 
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Say that is a null vector to network B : ^ y(i"out) jf vectors 

x(i"in) are a fooling pair to B. Intuitively, a null vector “hides” itself 

in the sense that it does not alter the state of B by traversing it. 

Proposition 3. Consider a balancing network B : ^ yli^out)^ Take any 

input vectors and that are a fooling pair to network B. If is 

a null vector to network B, then, x^'^‘“^ is also a null vector to network B. 

Proposition 4. Consider a balancing network B : — > y(’"°ut). Take any 

input vector that is null to B. Then, for any integer k > 0, 

( 1 ) 

(2) is a null vector to B. 

For any balancing network B, let Wout(^) denote the product of the fan-outs 
of balancers of B. For positive integer S, say that 6 divides if 5 divides each 
entry of x^^'i . 

Proposition 5. Consider a balancing network B : ^ yl^out)^ IfWout{B) 

divides then, is a null vector to B. 

Proposition 6. Consider any balancing network B : ^ y(i"out) 

a boundedness property II. IfWout{B) divides then, y(’"°“t) is a constant 

vector. 

Here is our main result: 

Theorem 1. Fix any boundedness property 11. Consider any balancing network 
Q . x(™in) — !• y('^out) such that y(’"°u‘) has the boundedness property II whenever 
yfiuiT) jg Q non-negative vector. Then, B has the boundedness property II. 

4 Conclusion 

We have shown that any balancing network that satisfies any boundedness 
property on all non-negative input vectors, continues to do so for any arbi- 
trary input vector. Interesting examples of such properties are the step prop- 
erty and the iF-smoothing property. A significant consequence of our result is 
that all known (deterministic) constructions of counting and smoothing net- 
works P 01 151 El rm rm nil IPtI E] Will correctly handle both tokens and 

antitokens, and therefore support both increment and decrement operations. 
Another significant consequence is that the sufficient timing conditions for lin- 
earizability in counting networks established in paiizi immediately carry over 
to antitokens. 

Aiello et al. jS] present a randomized counting network based on randomized 
balancers that toggle tokens according to some random permutation. We do not 
know whether such randomized networks can support antitokens. 
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A balancing network has the threshold property gld if 2/0 = 
and the weak threshold property jjj if there is some output index j, possibly 
j ^ 0, such that pj = || i/woutl • Since we have not established that either 

of these properties is a boundedness property, our result does not necessarily 
apply, and it remains unknown whether these properties are preserved by the 
introduction of antitokens. 
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Abstract. In a system in which noncooperative agents share a common 
resource, we propose the ratio between the worst possible Nash equilib- 
rium and the social optimum as a measure of the effectiveness of the 
system. Deriving upper and lower bounds for this ratio in a model in 
which several agents share a very simple network leads to some interest- 
ing mathematics, results, and open problems. 



1 Introduction 

Internet users and service providers act selfishly and spontaneously, without 
an authority that monitors and regulates network operation in order to achieve 
some “social optimum” such as minimum total delay . How much performance 
is lost because of this? This question appears to exemplify a novel and timely 
genre of algorithmic problems, in which we are investigating the cost of the lack 
of coordination — as opposed to the lack of information (on-line algorithms) or 
the lack of unbounded computational resources (approximation algorithms). As 
we show in this paper, this point of view leads to some interesting algorithmic 
and combinatorial questions and results. 

It is nontrivial to arrive at a compelling mathematical formulation of this 
question. Independent, non-cooperative agents obviously evoke game theory j^, 
and its main concept of rational behavior, the Nash equilibrium: In an environ- 
ment in which each agent is aware of the situation facing all other agents, a Nash 
equilibrium is a combination of choices (deterministic or randomized), one for 
each agent, from which no agent has an incentive to unilaterally move away. Nash 
equilibria are known not to always optimize overall performance, with the Pris- 
oner’s Dilemma pyrn] being the best-known example. Conditions under which 
Nash equilibria can achieve or approximate the overall optimum have been stud- 
ied extensively ([Hi!; see also Pam for studies on networks). However, this line 
of previous work compares the overall optimum with the best Nash equilibrium, 
not the worst, as befits our line of reasoning. To put it otherwise, this previous 
research aims at achieving or approximating the social optimum by implicit acts 
of coordination, whereas we are interested in evaluating the loss to the system 
due to its deliberate lack of coordination. 

Game-theoretic aspects of the Internet have also been considered by re- 
searchers associated with the Internet Society inni, with an eye towards de- 
signing variants of the Internet Protocols which are more resilient to video- like 
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traffic. Their point of view is also that of the mechanism design aspect of game 
theory, in that they try to design games (strategy spaces and reward tables) that 
encourage behaviors close to the social optimum. Understanding the worst-case 
distance of a Nash equilibrium from the social optimum in simple situations, 
which is the focus of the present paper, is a prerequisite for making rigorous 
progress in that project. 



The Model 

Let us make the general game-theoretic framework more precise. Consider a 
network in which each link has a law (curve) whereby traffic determines delay. 
Each of several agents wants to send a particular amount of traffic along a path 
from a fixed source to a fixed destination. This immediately defines a game- 
theoretic framework, in which each agent has as many pure strategies as there are 
paths from its origin to its destination, and the cost to an agent of a combination 
of strategies (one for each agent) is the negative of the total delay for each agent, 
as determined by the traffic on the links. There is also a well-defined optimization 
problem, in which we wish to minimize the social or overall optimum, the sum 
of all delays over all agents, say. The question we want to ask is, how far from 
the optimum total delay can be the total delay achieved by a Nash equilibrium? 
Numerical experiments reported in 0 imply that there are Nash equilibria which 
can be more than 20% off the overall optimum. 

In this paper we address a very simple special case of this problem, in which 
the network is just a set of m parallel links from an origin to a destination, 
all with the same capacity (similar special cases are studied in other works in 
this field, e.g. |Z|; we also briefly examine the case of two parallel links with 
unequal capacity). We model the delay of these links in a very simple way: Since 
the capacity is unit, we assume that the delay suffered by each agent using 
a link equals the total capacity of flow through this link. We assume that n 
agents have each an amount of traffic Wi, i = 1, ... ,n to send from the origin to 
the destination. Hence the resulting problem is essentially a scheduling problem 
with m links and n independent tasks with lengths Wi, i = l,...,n. The set 
of pure strategies for agent i is therefore {!,..., m}, and a mixed strategy is 
a distribution on this set. Let (ji,...,j„) G {1, . . . , m}" be a combination of 
pure strategies, one for each agent; its cost for agent i, denoted Ci{ji, . . . ,jn), is 
simply 

^ Wk, 

3 k — ji 

the finish time of the link chosen by v, here we assume that link j has in the 
beginning an initial task of length U scheduled, so it will be available for schedul- 
ing the agents’ tasks only after V time units. This calculation assumes that, if 
agent i’s task ends up in link j, it ends when all tasks on link j end; this is 
realistic if the tasks are broken in packets, which are then sent in a round-robin 
way. We also examine the alternative model, in which the tasks scheduled in 
link j are executed in a random batch order, and hence the cost to agent i is 
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Ci{ji, ■ . ■ , jn) = \ batch model. Finally, the 

cost to agent j of a combination of mixed strategies is the expected cost of the 
corresponding experiment in which a pure strategy is chosen independently for 
each agent, with the probability assigned to it by the mixed strategy. The overall 
optimum in this situation, against which we propose to compare the Nash equi- 
libria of the game just described, would be the optimum solution of the m-way 
load balancing (partition into m sets) problem for the n lengths wi, . . . , Wn- 
The costs in our model are a simplification of the delays incurred in a network 
link when agents inject traffic into it. The actual delays are in fact not the 
sums of the individual delays, but nonlinear functions, as increased traffic causes 
increased loss rates and delays. We discuss briefly in the last section the open 
problems suggested by our work that are associated with more accurate modeling 
of network delays. 



The Results of This Paper 

In this paper we show upper and lower bounds on the ratio between the worst 
Nash equilibrium and the overall optimum solution. 

— In a network with two parallel links, we show that the worst-case ratio is 
I (both upper and lower bound), independent of the number n of agents 
(Theorems E and |2I) . 

— The above result assumes that the two link speeds are the same. If the two 
links have different speeds, then the worst-case ratio increases to the golden 
ratio (j) = 1.618 . . . (lower bound, TheoremOJ. 

— Also, in the batch model of two links, the worst-case ratio is lower bounded 
by II = 1.6111... which is also an upper bound if we have two agents 

(Theorem EJ. 

— We have not been able to determine the answers for three or more links. 
However, the worst-case ratio (in all of the above models) is bounded from 
below by the ratio suggested by the load-balancing aspect of the problem, 
that is to say, I^( io*gio™m ) (TheoremEj). Using the Azuma-Hoeffding inequal- 
ity, we establish an 0 (\Zto logm) upper bound (TheoremEI). A similar bound 
holds for links of different speeds (Theorem EJ . 

2 All Nash Equilibria 

We consider the case of n agents sharing m identical links. Before describing all 
Nash equilibria, we need a few definitions. We usually use subscripts for agents 
and superscripts for links. For example, for a Nash equilibrium, we denote the 
probability that agents i selects link j with . Let denote the expected traffic 
on link j. If U is the initial load on link j, it is easy to see that 

= U + 'y^p‘.w^. 



( 1 ) 
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From the point of view of agent z, its finish time when its own traffic Wi is 
assigned to link j is 

ci = Wi + U + + (1 - ( 2 ) 

Probabilities define a Nash equilibrium if there is no incentive for agent i to 
change its strategy. Thus, agent i will assign nonzero probabilities only to links 
j that minimize c^. We will denote this minimum value by Ci, i.e., 

d = min d - , 
j 

and we will call the set of links Si = {j : pf >0} the support of agent i. More 
generally, let Sf be an indicator variable that takes value 1 when pi > 0. 

Conversely, a Nash equilibrium is completely defined by the supports 
Si,...,Sn of all agents. More precisely, if we fix the Sf's, the strategies in a 
Nash equilibrium are given by 

pP. = [M^ +w^~ Ci)/w^ ( 3 ) 

subject to 

for all j: = U + Y,i S{ {M^ + - c*) 

for all i: + Wi - d) = 

To see that these constraints indeed define an equilibrium, notice that the 
first set of equations is equivalent to (0. The constraints are equivalent to o, 
and to the fact that the probabilities of agent i should sum up to exactly 1. Notice 
also that the set of constraints specify in general, a unique solution for d and 
(there are n + m constraints and n + m unknowns). If the resulting probabilities 
pfl are in the interval (0, 1], then the above equations define an equilibrium with 
support SP. Thus, an equilibrium is completely defined by the supports of the 
agents (although not all supports give rise to a feasible equilibrium). As a result, 
the number of equilibria is, in general, exponential in n and m. 

A natural quantity associated with an equilibrium is the expected maximum 
traffic over all links: 



cost max {L^ + ^ wt}. (4) 

il=l irL = li=l t:jt=j 

We call it the social cost and we wish to compare it with the social optimum opt. 
More precisely, we want to estimate the coordination ratio which is the worst- 
case ratio R = max cost /opt (the maximum is over all equilibria). Computing the 
social optimum opt is an NP-complete problem (partition problem), but for the 
purpose of upper bounding R here, it suffices to use two simple approximations 
of it: opt > max{wi, fm} = max{wi, (X)j Wi)/'m} (we shall be 

assuming that w\ > W 2 > ■ ■ ■ > Wn) ■ 
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3 Worst-Case Equilibria for 2 Links 

We shall assume that there are no initial loads — that is, all U’s are zero. 
This is no restriction at all for the standard model, because initial loads can be 
considered as jobs of m additional agents, each with a pure strategy. However, 
this may not be true for other models. In particular, in the batch model (the one 
with the I factor in front of ’^Wi) it follows from our results that initial loads 
result in strictly worse ratio. 

Our first theorem is trivial: 

Theorem 1. The coordination ratio for 2 links is at least 3/2. 

Proof. Consider two agents with traffic wi = W 2 = 1. It is easy to check that 
probabilities p/ = l/2forz,j = 1, 2 give rise to a Nash equilibrium. The expected 
maximum load is cost = 3/2 and the social optimum is opt = 1 achieved by 
allocating each job to its own link. 

Our main technical result of this section is a matching upper bound. To 
prove it, we find a way to upper bound the complicated expression 0 ) for the 
social cost. In fact, it is relatively easy to compute the strategies of a Nash 
equilibrium. There are 2 types of agents: pure strategy agents with support of 
size one and stochastic agents with support of size 2. Let be the sum of all 
jobs of pure strategy agents assigned to link j. Also let fc > 1 denote the number 
of stochastic agents. It is not difficult to verify that the system of equations (0 
gives the following probabilities of a stochastic agent i: 



However, we don’t see how to use this expression to upper bound ©. 

Central to our proof of the upper bound is the notion contribution probability: 
The contribution probability qi of agent i is equal to the probability that its 
job goes to the link of maximum load (if there are more than one maximum 
load links, we consider the lexicographically first such link, say). Clearly, the 
social cost is given by cost = QiWi- The key idea in our proof is to consider 
the pairwise contribution to social cost. In particular, let tik be the collision 
probability of agents i and k, that is, the probability that the traffic of both agents 
goes to the same link. Observe then that both agents i and k can contribute to 
the social cost only if they collide, that is. 



The following lemma provides a crucial property of collision probabilities. It 
holds for any number of links. 

Lemma 1. The collision probabilities of a Nash equilibrium of n agents and m 
links satisfy 




1 rfi + _ 2d^ 

2 2{k — l)wi 



( 5 ) 



qi + qi if 1 + tife • 



( 6 ) 
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Proof. Observe first that tik = '^jPiPk- Therefore, we have 

'^tikWk = ^pI '^p’k^k = '^piiivP -plwi). 



k=jti j k^i j 



It follows from Q that we can use pfwi = + Wi — Ci. There is a minor 

technical point to be made here: the equality p\ Wi = IVP + Wi — Ci holds only if 
link j is in the support of agent i (pj > 0). However, observe that when p^ = 0 
there is no harm in replacing pjwi with any expression. We get 



'^tikWk = '^P\{ci - Wi) = Cj - Wr. 



3 



A final ingredient for the proof is the bound (which also holds for any number 
of agents and links): 




(7) 



m 



m 



1 V 

This follows from d = min^- + (1 - PD^i) = — h = 



E - 




Theorem 2. The coordination ratio for any number of agents and m = 2 links 
is at most 3/2. 

Proof. We have seen that pairwise the contribution probabilities satisfy qi + qk < 
1 -I- tik. Therefore, + Qk)wk < + Uk)wk. Using Lemma [Q and 

bound CD, we get + qk)wk < | J2k^i From this we can compute 



Recall that opt > max{i Efc If lor some agent i, qi > |, then {2qi — 
|)wi < {2qi — |)opt and cost < (| — (?i)2opt -|- {2qi — |)opt = |opt. Otherwise, 
when all contribution probabilities are at most |, cost = J^k^kWk < | Efc rr'fc < 
|opt. 

Links with Different Speeds 

So far, we assumed that all links have the same speed or capacity. We now 
consider the general problem where links may have different speeds. Let Sj be 
the speed of link j. Without loss of generality, we shall assume si < • • • < Sm- 
We can estimate all Nash equilibria again. Equation (j2D now becomes 




k k 



cl = {M^ + (l-pl)w.)/sj. 



( 8 ) 
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and the equilibria are given by: 

p{ = {M^ +Wi- SjCi)!wi (9) 

subject to 

for all j: NP = U + Sf{lVP + Wi — SjCi) 
for all i: + Wi ~ SjCi) = Wi 

We can extend the lower bound TheoremEto this case: 



Theorem 3. The coordination ratio for two links with speeds si < S 2 is at least 
R = 1 + S2/{si + S2) when S2 < 4>si, where (/)=(! + ^/b)/2. The coordination 
ratio R achieves its maximum value (f> when 82/ si = 4>. 

Proof. We first describe the equilibria for any number of agents. Again let d^ be 
the sum of all traffic assigned to link j by pure agents. We give the probabilities 
pj of the stochastic agents (pf = 1 — pj). 



S 2 _ (S 2 - Sl) Wj + 

Si + S2 (A: - l)(si + S2)wi 



It is not hard to verify that these probabilities indeed satisfy ( 0 . To prove the 

theorem, we consider the case of no initial loads and two agents with jobs wi = S 2 

2 2 

and W 2 = si. The probabilities are p\ = P 2 ~ si(si+s 2 ) ’ 

then compute cost = {pIpI/si + p^pl/ S2){wi + W 2 ) + (PiPi/^i + P1P2/ = 
(si + 2 s2)/(si + S 2 ) and opt = 1. The lower bound follows. 

It is worth mentioning that when S 2 /S 1 > 4> the probabilities given above are 
outside the interval [0,1]. Therefore, both agents have pure strategies and the 
coordination ratio is 1. 



We believe that the proof of Theorem |2| can be appropriately generalized to 
the case of links of different speeds. 



The Batch Model 

For the batch model with two links we can prove the following bounds (proof 
omitted) : 

Theorem 4. In the batch model with two identical links, the coordination ratio 
is between || = 1.61 . . . and 2. The lower bound is also an upper bound in 
the case of n = 2 agents. 

When the links have no initial load, the batch model and the standard model 
have the same equilibria and the same coordination ratio. However, in the general 
case, as the above theorem demonstrates, the batch model has higher coordina- 
tion ratio. But it cannot be much higher: 

Theorem 5. For m links and any number of agents, the coordination ratios of 
the batch model and the standard model differ by at most a factor of 2. 
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We omit the details of the proof, but we point out the main idea: We can consider 
the initial loads of the batch model as pure strategy agents of weight 2Lj. 
This preserves the equilibria and changes the social optimum by at most a factor 
of 2. 



4 Worst-Case Equilibria for m Links 



We now consider lower bounds for the coordination ratio for m links. 

Theorem 6. The coordination ratio for m identical links is f2(log m/ log log m) . 

Proof. Consider the case where there are m agents, each with a unit job, i.e., 
Wi = 1. If the links have no initial load, it is easy to see that the uniform strategies 
with = 1/m for = 1, . . . , m is an equilibrium. This is identical to the 
problem of throwing m balls into m bins and asking for the expected maximum 
number of balls in a bin. The answer is well-known to be 0(logm/loglogm). 



We believe that this lower bound is tight: That is, if Tm denotes the expected 
maximum number of balls in a bin, we conjecture that the coordination ratio 
for any number of agents and m identical links is Tm (in the standard model). 
Theorem El shows that the conjecture holds for m = 2. 

We believe that a proof of the conjecture can be obtained by appropriately 
generalizing the proof technique of Theorem El it seems however that a substan- 
tially deeper structural theorem about the Nash equilibria, similar to Lemma E 
is needed. Here, we give a weaker upper bound. But first we need the following 
theorem, which is interesting on its own. 

Theorem 7. For m identical links, the expected load of any link j is at most 
(2 — l/m)opt. For links with different speeds, Mj is at most Sj(l -|- \/m — l)opt. 

Proof. For identical links the theorem follows directly from (Q by observing that 
< Ci < Wi)/m+ (m — l)wi/m < Sj{2 — l/m)opt. 

The proof for links with different speeds has the same flavor with (Q. This 
time we take a weighted average over the links (the weight for machine ? is 
.,/E.^.).Thus, 

Also, c™ < (M™ + Wi)/sm< (Er + ^i) / Sm- In Summary, 



J/ ■ rEr^’'+ Er^’’ + ^il 

4 < mm{ , }. 



However, we can lower bound the social optimum by max{wi/ Sm, 

Thus, we get c/ < opt -|- minj^^^i^, — }. Using the obvious inequality 

min{xa/b, c/d} < y/xma,x{a/d,c/b}, we get c/ < {1 + ^m — l)opt. We can then 
conclude that ]VP < sjc} < Sj(l -I- pm — l)opt. 
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We can now prove an upper bound for the case of m identical links. 

Theorem 8. The coordination ratio of any number of agents and m identical 
links is at most T = 3 + -s/dmlnm. 

Proof. Using a martingale concentration bound known as the Azuma-Hoeffding 
inequality |3j, we will show that the load of a given link j exceeds (T — l)opt with 
probability at most Ifmf. Then, the probability that the maximum load on all 
links does not exceed {T — l)opt is at least 1 — 1/m. It follows that the expected 
maximum load is bounded by (1 — l/m){T — l)opt + l/m(mopt) < Topt. 

It remains to show that indeed the probability that the load of a given link 
j exceeds (T — l)opt is small (at most Let Xi be a random variable 

denoting the contribution of agent i to the load of link j. In particular, Pr[Xi = 
w\] = pI and Pr[Xi = 0] = 1 — Clearly, the random variables Xi, . . . ,X^ 
are independent. We are interested in estimating the probability Prl'^^Xi > 
(T — l)opt]. Since the weights Wi and the probabilities pj may vary a lot, we 
don’t expect the sum Xi to exhibit the good concentration bounds of sums 
of binomial variables. However, we can get a weaker bound using the Azuma- 
Hoeffding inequality. The inequality gives very good results for probabilities 
around 1/2. Unfortunately, in our case the probabilities may be very close to 0 
or 1. 

Let Pi = E[Xi] and consider the martingale Yt = Xi + - ■ ■+Xt+pt+i + ' ■ ■+Pn 
(it is straightforward to verify E[Yt+i\Yt] = Yt). Observe that |Yt+i — Yt\ = 
\Xt+i — pt+i\ Y Wt+i- We can then apply the Azuma-Hoeffding’s inequality: 

Pr[Yn -Yo>x]< 

Let X = {T — 3) opt. Since Yq = ^^^pi = < 2opt (Theorem [3), we get that 

the load of link j exceeds (T — l)opt with probability at most e 2“ ' 
However, it is not hard to establish that 

''^^Wi < max{rnWi,rn('^^Wi/iTi)^} < mopt^. 
i i 

Thus the probability that the load of link j exceeds (T — l)opt is at most 
/m^ Pqj. p — 3 _|_ Y^477ilnm, this probability becomes 1/m^ and the 
proof is complete. 

It is worth noticing that the only structural property of Nash equilibria we 
needed in the proof of the above theorem is that the expected load of a link j is 
at most 2opt (and, of course, the independence of the agent strategies). We can 
use a similar proof to extend the theorem to the case of m links with different 
speeds: 

Theorem 9. The coordination ratio of any number of agents and m different 
links is 0(y^7f Ej if-Vlogm). 
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5 Discussion and Open Problems 

We believe that the approach introduced in this paper, namely evaluating the 
worst-case ratio of Nash equilibria to the social optimum, may prove a useful 
calculation in many contexts. Although the Nash equilibrium is not trivial to 
reach without coordination, it does serve as an important indicator of the kinds 
of behaviors exhibited by noncooperative agents. 

Besides bridging the gaps left open in our theorems, there are several ex- 
tensions of this work that seem interesting, namely, investigating with the same 
point of view more complex and realistic cost models, for example, when the 
cost is given by ^ min{c^ii; } ^ capacity of a link and ^ Wi its 

load [7|. More important is the study of realistic Internet metrics, that result 
from the employed protocols such as the one related to TCP and the square root 
of the drop frequency 0. Finally, it would be extremely interesting, once the 
relative quality of the Nash equilibria in such situations is better understood, to 
employ such understanding in the design of improved protocols 
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Abstract. We advocate to analyze the average complexity of learn- 
ing problems. An appropriate framework for this purpose is introduced. 
Based on it we consider the problem of learning monomials and the spe- 
cial case of learning monotone monomials in the limit and for on-line 
predictions in two variants: from positive data only, and from positive 
and negative examples. The well-known Wholist algorithm is completely 
analyzed, in particular its average-case behavior with respect to the class 
of binomial distributions. We consider different complexity measures: the 
number of mind changes, the number of prediction errors, and the to- 
tal learning time. Tight bounds are obtained implying that worst case 
bounds are too pessimistic. On the average learning can be achieved 
exponentially faster. 

Furthermore, we study a new learning model, stochastic finite learning, 
in which, in contrast to PAG learning, some information about the un- 
derlying distribution is given and the goal is to find a correct (not only 
approximatively correct) hypothesis. We develop techniques to obtain 
good bounds for stochastic finite learning from a precise average case 
analysis of strategies for learning in the limit and illustrate our approach 
for the case of learning monomials. 



1. Introduction 

Learning concepts efficiently has attracted considerable attention during the 
last decade. However, research following the traditional lines of inductive in- 
ference has mainly considered the update time, i.e., the effort to compute a 
single new hypothesis. Starting with Valiant’s paper m, the total amount of 
time needed to solve a given learning problem has been investigated as well. The 
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complexity bounds proved within the PAC model are usually worst-case bounds. 
In experimental studies large gaps have often been observed between the time 
bounds obtained by a mathematical analysis and the actual runtime of a learner 
on typical data. This phenomenon can be explained easily. Data from running 
tests provide information about the average-case performance of a learner, rather 
than its worst-case behavior. Since algorithmic learning has a lot of practical ap- 
plications it is of great interest to analyze the average-case performance, and to 
obtain tight bounds saying something about the typical behavior in practice. 

Pazzani and Sarrett m have proposed a framework for analyzing the average- 
case behavior of learning algorithms. Several authors have followed their ap- 
proach (cf., e.g., Their main goal is to predict the expected accuracy of 

the hypothesis produced with respect to the number of training examples. How- 
ever, the results obtained so far are not satisfactory. Typically, the probability 
that a random example is misclassified by the current hypothesis is estimated 
by a complicated formula. The evaluation of this formula, and the computation 
of the corresponding expectation has been done by Monte-Carlo simulations. 
Clearly, such an approach does not provide general results about the average- 
case behavior for broader classes of distributions. Moreover, it is hard to compare 
these bounds with those proved for the PAC model. 

We outline a new setting to study the average-case behavior of learning al- 
gorithms overcoming these drawbacks and illustrate it for learning monomials. 

2. Preliminaries 

Let N = {0, 1,2,.. .} be the set of all natural numbers, and let N'*' := N\{0} . 
If M is a set, \M\ is used for its cardinality. For an infinite sequence d and 
j G N+ let d[j] denote the initial segment of d of length j . By (0, 1) we 
denote the real interval from 0 to 1 excluding both endpoints. For n G N'*’, 
let Xn = {0, 1}" be the learning domain and p{Xn) the power set of . A 
subset c of Xn is called a concept, and a subset C of p(Xn) a concept class. 
The notation c is also used to denote the characteristic function of a subset, that 
is for b G Xn'. c{h) = 1 iff 6 G c. To define the classes of concepts we deal with in 
this paper let £„ = {xi,X\,X 2 , T 2 ■ • ■ , a;„, Xn} be a set of literals. Xi is a positive 
literal and Xi a negative one. A conjunction of literals defines a monomial. For 
a monomial m let zf^{rn) denote its length, that is the number of literals in it. 

m describes a subset L{m) of in other words a concept, in the obvious 
way: the concept contains exactly those binary vectors for which the monomial 
evaluates to 1, that is L{m) := {b G Xn \ m{b) = 1}. The collection of objects 
we are going to learn is the set Cn of all concepts that are describable by 
monomials over . There are two trivial concepts, the empty subset and 
itself. Xn , which will also be called “TRUE” , can be represented by the empty 
monomial. The concept “FALSE” has several descriptions. To avoid ambiguity, 
we always represent “FALSE” by the monomial x\Xi . . . XnXn ■ Furthermore, we 
often identify the set of all monomials over £„ and the concept class C„ . Note 
that \Cn\ = 3" -I- 1 . We also consider the subclass A4C„ of consisting of those 
concepts that can be described by monotone monomials, i.e., by monomials 
containing positive literals only. It holds |AIC„| = 2”. 
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3. Learning Models and Complexity Measures 

The first learning model we are dealing with is the on-line prediction model 
going back to Barzdin, Freivald [[] and Littlestone m- In this setting the source 
of information is specified as follows. The learner is given a sequence of labeled 
examples d = (dj)jgN+ = (^ii c(6i), 62, c(62), ^3, 0(63), . . .) from the concept 
c, where the bj G Xn, and c{bj) = 1 if G c and c{bj) = 0 otherwise. The 
examples bj are picked arbitrarily and the information provided is assumed to 
be without any errors. We refer to such sequences as data sequences and use 
data{c) to denote the set of all data sequences for concept c. 

A learner P must predict c{bj) after having seen d[2j — 1] = (61, c(6i), . . . , 
bj-i, c{bj-i), bj) . We denote this hypothesis by P{d[2j — 1]) . Then it receives the 
true value c{bj) and the next Boolean vector bj+i . The learner has successfully 
learned if it eventually reaches a point beyond which it always predicts correctly. 
Definition 1. A concept class C is called on-line predictable if there is a 
learner P such that for all concepts c G C and all data sequences d = {dj)j^^+ G 
data{c) it holds: P{d[2j — 1]) is defined for all j , and P{d[2j — 1]) = d2j for all 
but finitely many j . 

For on-line prediction, the complexity measure considered is the number 
of prediction errors made. Note that the prediction goal can always be achieved 
trivially if the learning domain is finite. Therefore, we aim to minimize the 
number of prediction errors when learning monomials. 

Next, let us define Gold-style 0 learning in the limit. One distinguishes 
between learning from positive and negative data, and learning from positive 
data only. For a concept c, let info{c) be the set of those data sequences 
{bi, c{bi), 62, 0(62), ^3, 0(63), . . .) in data{c) that contain each element b of the 
learning domain Xn at least once. Such a sequence is called informant. For 
ease of notation let us pair each element bj with its classification c{bj) . Then 
the j-th entry of an informant sequence d will be dj := (bj,c{bj)). 

A positive presentation of c is a data sequence that contains only elements 
of c and each one at least once. Thus all the values c{bj) are equal to 1 and 
thus could be omitted. In this case we will denote the sequence simply by c? = 
{dj)j^fj+ = (61, &2, bs , ...) . Let d[j]~^ {^i I 1 < * < j} be the set of all examples 

contained in the prefix of d of length j, and let pos{c) denote the set of all 
positive presentations of c . The elements of pos (c) are also called a text for c . 

A limit learner is an inductive inference machine (abbr. IIM). An IIM 
M works as follows. As inputs it gets incrementally growing segments of a pos- 
itive presentation (resp. of an informant) d. After each new input, it outputs 
a hypothesis M{d[j]) from a predefined hypothesis space H. Each hypothesis 
refers to a unique element of the concept class. 

Definition 2. Let C be a eoncept class and let H be a hypothesis space for 
it. C is called learnable in the limit from positive presentation (resp. from 
informant) if there is an IIM M such that for every c G C and every d G pos{c) 
(resp. d G info{c) ): M{d[j]) is defined for all j , and M{d[j]) = h for all but 
finitely many j , where h G 7i is a hypothesis referring to c . 
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For the concept class we choose as hypothesis space the set of all mono- 
mials over Cn , whereas for it is the set of all monotone monomials. Again, 

we are interested how efficiently C„ and can be learned in the limit. 

The first complexity measure we consider is the mind change complexity. 
A mind change occurs iff M{d\j]) ^ M{d\j + 1 ]) . Clearly, this measure is closely 
related to the number of prediction errors. Both complexity measures say little 
about the total amount of data and time needed until a concept is guessed 
correctly. Thus, for learning in the limit we also measure the time complexity. 
As in | 3 j we define the total learning time as follows. Let M be any IIM learning 
a concept class C in the limit. Then, for c C C and a text or informant d for c, let 
Con{M,d) = the least i € N’*’ such that M(d[j]) = M(d[i]) for all j > i 
denote the stage of convergence of M on d (cf. |S|). Moreover, by TM{dj) we 
denote the number of steps to compute M{d[j]). We measure this quantity as 
a function of the length of the input and refer to it as the update time. Finally, 
the total learning time taken by the IIM M on a sequence d is defined as 
TT{M,d) := '^M{d[j]) . Given a probability distribution D on 

the data sequences d we evaluate the expectation of TT{M,d) with respect 
to Z?, the average total learning time. 

4. The Wholist Algorithm Learning Monomials 

Next, we present Haussler’s jSl Wholist algorithm for on-line prediction of 
monomials. For learning in the limit this algorithm can be modified straight- 
forwardly. The limit learner computes a new hypothesis using only the most 
recent example received and his old hypothesis. Such learners are called iterative 
(cf. 0 ). Let c G C„, let d G data{c), and let bi = bjbf . . .b"^ denote the i-th 
Boolean vector in d. Recall that “TRUE” is represented by the empty monomial. 
Algorithm V: On input sequence ( 5 i, c( 5 i), 62, 0(62), . . .) do the following: 

Initialize ho := X\X\ . . . XnXn ■ 

for i = 1, 2, . . . do 

let hi- 1 denote P’s internal hypothesis produced before receiving bi] 
when receiving bi predict hi-i(bi); read c{bi); 
if hi-i{bi) = c{bi) then hi := h^-i 

else for j := 1 to n do 

if 6^ = 1 then delete Xj in hi-i else delete Xj in hi-i\ 
let hi be the resulting monomial 

end. 

Note that the algorithm is monotone with respect to the sequence of its 
internal hypotheses: hi > hi-\ when considered as function on . 

Theorem 1. Algorithm P learns the set of all monomials within the prediction 
model. It makes at most n -I- 1 prediction errors. 

To learn the monotone concept class AtC„ , algorithm P can be easily mod- 
ified by initializing hg = X\X2 ■ ■ ■ Xn and by simplifying the loop appropriately. 
We refer to the modified algorithm as to A4P. 

Theorem Q can be directly reproved for A4P with the only difference that 
now the worst-case bound for the number of prediction errors is n instead of 
n -I- 1 . 
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5. Complexity Analysis: Best and Worst Case 

For the learning models defined above we estimate the best-case complexity, 
the worst-case complexity, and the expectation of algorithm V and M.'P. We 
start with the first two issues. Both algorithms do not make any prediction errors 
iff the initial hypothesis equals the target monomial. For P this means that 
the concept to be learned is “FALSE”, while for M.'P the concept is the all-1 
vector. These special concepts can be considered as minimal in their class. For 
them the best-case and the worst-case number of predictions errors coincide. 

In the general case, we call the literals in a monomial m relevant. All other 
literals in £„ (resp. in {si, . . . , Xn} in the monotone case) are said to be irrel- 
evant for m. There are 2n — =ff{m) irrelevant literals in general, and n — #(m) 
in the monotone case. We call bit i relevant for m if Xi or Xi is relevant for 
TO. By k = k{m) = n — ^{m) we denote the number of irrelevant bits. 
Theorem 2. Let c = L{m) he a non-minimal eoneept in A4Cn ■ Then algorithm 
AdV makes 1 predietion error in the best case, and k{m) prediction errors in 
the worst-case. 

If c is a non-minimal concept of Cn algorithm V makes 2 prediction errors in 
the best case and 1 -|- k(m) prediction errors in the worst-case. 

As Theorem 0shows, the gap between the best-case and worst-case behavior 
can be quite large. Thus, we ask what are the expected bounds for the number of 
prediction errors on randomly generated data sequences. Before answering this 
question we estimate the worst-case number of prediction errors averaged over 
the whole concept class AACn , resp. . Thus we get a complexity bound with 
respect to the parameter n, instead of ff{m) as in Theorem |21 This averaging 
depends on the underlying probability distribution for selecting the target con- 
cepts (for the corresponding data sequences we consider the worst input). The 
average is shown to be linear in n if the literals are binomially distributed. 

To generate the probability distributions we assume for A4Cn the relevant 
positive literals to be drawn independently at random with probability p, p G 
(0, 1) . Thus, with probability 1—p a literal is irrelevant. The length of the mono- 
mials drawn by this distribution is binomially distributed with parameter p. 
Thus we call such a distribution on the concept class a binomial distribution. 
Theorem 3. Let the concepts in A4Cn be binomially distributed with parame- 
ter p . Then the average number of prediction errors of AA.'P for the worst data 
sequences is n(l — p) . 

In case p = 1/2, the bound says that the maximal number of prediction 
errors when uniformly averaged over all concepts in MCn is n/2. 

Next, we deal with the class . For comparing it to the monotone case we 
have to clarify what does it mean for concepts in Cn to be binomially distributed. 
Since there are 3" -I- 1 many concepts in Cn for a uniform distribution each con- 
cept must have probability 1/(3" -I- 1) . For each position i = 1, . . . , n three op- 
tions are possible, i.e., we may choose Xi , Xi or neither of them. This suggests the 
formula > where p\ is the probability to take Xi, p 2 the prob- 

ability to choose Xi and pa the probability to choose none, and pi -l-p 2 +P 3 = 1 
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and k\ + k2 + k^ = n. k\ + k2 counts the number of relevant literals, resp. bits. 
However, this formula does not include the concept “FALSE.” Thus, let us in- 
troduce p/ C (Oj 1) for the probability to choose “FALSE.” Then the formula 
becomes (1 — P/)(^^ ^ oall such a probability distribution a 

weighted multinomial distWbitfion with parameters (pf,pi,p 2 ,P 3 ). 
Theorem 4. Let the concepts in Cn occur according to a weighted multino- 
mial distribution with parameters (pf,pi,p2,P3)- Then the average number of 
prediction errors of V for the worst data sequences is (1 — p/)(l -I- nps) . 

For the particular case that all concepts from are equally likely, i.e.. 
Pi = P 2 = P 3 = 1/3 and p/ = 1/(3” -1-1) , we directly get that on the average less 
than n/3 -I- 1 errors are to be expected given the worst data sequences. Hence, 
in this case the class C„ seems to be easier to learn than A4C„ with respect 
to the complexity measure prediction errors. However, this impression is a bit 
misleading, since the probabilities to generate an irrelevant literal are different., 
i.e., 1/3 for Cn and 1/2 for A4C„ . If we assume the probabilities to generate an 
irrelevant literal to be equal, say q, and make the meaningful assumption that 
“FALSE” has the same probability as “TRUE” then the average complexity is 
i+qn (1 + n.g) for and nq for A4C„. Since for Cn it holds q = 1 — (pi-|-p 2 ), 
and for A4Cn q = 1— P, under these assumptions A4Cn is easier to learn than Cn . 
This insight is interesting, since it clearly shows the influence of the underlying 
distribution. In contrast, previous work has expressed these bounds in terms of 
the VC-dimension which is the same for both classes, i.e., n. 

The results above directly translate to learning in the limit from informant or 
from positive presentations for the complexity measure number of mind changes. 

What can be said about the total learning time? The best-case can be handled 
as above. Sine the update time is linear in n for both algorithms A 47 ^ and V, in 
the best case the total learning time is linear. The worst-case total learning time 
is unbounded for both algorithms, since every text and informant may contain as 
many repetitions of data not possessing enough information to learn the target. 

Hence, as far as learning in the limit and the complexity measure total learn- 
ing time are concerned, there is a huge gap between the best-case and the worst- 
case behavior. Since the worst-case is unbounded, it does not make sense to 
ask for an analogue to Theorem El and 0 Instead, we continue by studying the 
average-case behavior of the limit learner V and M.'P. 

6. Average-Case Analysis for Learning in the Limit from Text 

For the following average case analysis we assume that the data sequences 
are generated at random with respect to some probability distribution D taken 
from a class of admissible distributions T> specified below. We are interested 
in the average number of examples till an algorithm has converged to a correct 
hypothesis. CON denotes a random variable counting the number of examples 
till convergence. Let d be a text of the concept c to be learned that is generated 
at random according to H . If the concept to be learned is “FALSE” no examples 
are needed. Otherwise, if the target concept contains precisely n literals then 
one positive example suffices (note that this one is unique). Thus, for these 
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two cases everything is clear and the probability distributions D on the set of 
positive examples for c are trivial. 

For analyzing the nontrivial cases, let c = L(m) G be a concept with 
monomial m = such that k = k{m) = n — #(m) > 0. There are 2^ 

positive examples for c . For the sake of presentation, we assume these examples 
to be binomially distributed. That is, in a random positive example all entries 
corresponding to irrelevant bits are selected independently of each other. With 
some probability p this will be a 1 , and with probability q := 1—p a 0 . We shall 
consider only nontrivial distributions where 0 < p < 1 . Note that otherwise the 
data sequence does not contain all positive examples. We aim to compute the 
expected number of examples taken by 7^ until convergence. 

The first example received forces V to delete precisely n of the 2n literals 
in ho - Thus, this example always plays a special role. Note that the resulting 
hypothesis hi depends on bi , but the number k of literals that remain to be 
deleted from hi until convergence is independent of bi . Using tail bound tech- 
niques, we can show the following theorem. 

Theorem 5. Let c = L(m) be a non-minimal concept in Cn, and let the pos- 
itive examples for c be binomially distributed with parameter p. Define 'tp \= 
min{j^, i}. Then the expected number of positive examples needed by algo- 
rithm P until convergence can be bounded by if [CON] < [log,^ k(rn)~\ -\- 3 . 

A similar analysis can be given in the monotone setting for algorithm AdV. 
Corollary 6. For every binomially distributed text with parameter 0 < p < 1 
the average total learning time of algorithm "P for concepts in Cn with p, literals 
is at most 0(n(log(n — p-\-2)) . 

The expectation alone does not provide complete information about the av- 
erage case behavior of an algorithm. We also like to deduce bounds on how often 
the algorithm exceeds the average considerably. The Wholist algorithm possesses 
two favorable properties that simplify this derivation considerably, i.e., it is set- 
driven and conservative. Set-driven means that for all c G Cn all d, h G pos{c) 
and all i,j G N'*' the equality d[f]+ = h[j]'^ implies P{d[i]) = V{h[j]) . A learner 
is said to be conservative if every mind change is caused by an inconsistency with 
the data seen so far. Clearly, the Wholist algorithm satisfies this condition, too. 
Now, the following theorem establishes exponentially shrinking tail bounds for 
the expected number of examples needed in order to achieve convergence. 
Theorem 7 (USD- Let CON be the sample complexity of a conservative and set- 
driven learning algorithm. Then Pr[CON > 2t ■ if [CON]] < for all t GN. 

A simple calculation shows that in case of exponentially shrinking tail bounds 
the variance is bounded by 0(if [CON]^) . 

7. Stochastic Finite Learning 

Next we shall show how to convert the Wholist algorithm into a text learner 
that identifies all concepts in Cn stochastically in a bounded number of rounds 
with high confidence. A bit additional knowledge concerning the underlying class 
of probability distributions is required. Thus, in contrast to the PAC model, the 
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resulting learning model is not distribution- free. But with respect to the quality 
of its hypotheses, it is stronger than the PAC model by requiring the output to 
be probably exactly correct rather than probably approximately correct. The main 
advantage is the usage of the additional knowledge to reduce the sample size, 
and hence the total learning time drastically. This contrasts to previous work 
in the area of PAC learning (cf., e.g., 121417111111171 1. These papers have shown 
concepts classes to be PAC learnable from polynomially many examples given a 
known distribution or class of distributions, while the general PAC learnability of 
these concepts classes is not achievable or remains open. Note that our general 
approach, i.e., performing an average-case analysis and proving exponentially 
shrinking tail bounds for the expected total learning time, can also be applied 
to obtain results along this line (cf. [11 f>| I tij l. 

Definition 3. Let T> be a set of probability distributions on the learning domain, 
C a concept class, H a hypothesis space for C , and S G (0, 1) . (C,V) is said to 
be stochastically finite learnable with 5 -confidencewith respect to TL iff 
there is an IIM M that for every c G C and every D G T> performs as follows. 
Given a random presentation d for c generated according to D , M stops after 
having seen a finite number of examples and outputs a single hypothesis h G 7t . 
With probability at least 1 — (5 (with respect to distribution D ) h has to be 
correct, that is L(h) = c in case of monomials. If stochastic finite learning can 
be achieved with 5 -confidence for every (5 > 0 then we say that {C,T>) can be 
learned stochastically finite with high confidence. 

We study the case that the positive examples are binomially distributed with 
parameter p . But we do not require precise knowledge about the underlying dis- 
tribution. Instead, we reasonably assume that prior knowledge is provided by pa- 
rameters Plow and Pup such that piow < P < Pup for the true parameter p . Bino- 
mial distributions fulfilling this requirement are called {piowiPup) —admissible 
distributions. Let [piow ? Pup] denote the set of such distributions on Xn . 

If bounds Plow and Pup are available, the Wholist algorithm can be trans- 
formed into a stochastic finite learner inferring all concepts with high confidence. 
Theorem 8. Let 0 < piow < Pup < 1 and ip := minjr; — - — , — |. Then 

^ '' J - — Plow Pup 

{Cn,'Dn[piow,Pup]) is stochasUcally finitely learnable with high confidence from 
positive presentations. To achieve 5 -confidence no more than 0(log2 l/i5-log,^n) 
many examples are necessary. 

The latter example bound can even be improved to log,^ n-l-log,^ 1/5 -\- 0(1) by 
performing a careful error analysis, i.e., for the Wholist algorithm, the confidence 
requirement increases the sample size by an additive term log,^ 1/8 only. 

8. Average-Case Analysis for Learning in the Limit from Informant 



Finally, we consider how the results obtained so far translate to the case of 
learning from informant. First, we investigate the uniform distribution over 
Again, we have the trivial cases that the target is “FALSE” or m is a monomial 
without irrelevant bits. In the first case, no example is needed at all, while in the 
latter one, there is only one positive example having probability 2“" . Thus the 
expected number of examples needed until successful learning is 2" = . 
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Theorem 9. Let c = L{m) G Cn be a nontrivial concept. If an informant 
for c is generated from the uniform distribution by independent draws the ex- 
pected number of examples needed by algorithm "P until convergence is bounded 
by £;[CON] < k{m)'] +3) . 

Hence, as long as k{m) = n — 0{l) , we still achieve an expected total learning 
time 0(n log n) . But if ff{m) = I7(n) the expected total learning is exponential. 
However, if there are many relevant literals then even h^ may be considered as 
a not too bad approximation for c. Thus, let e G (0, 1) be an error parameter as 
in the PAC model. We ask if one can achieve an expected sample complexity for 
computing an £ -approximation that is polynomially bounded in logn and l/£. 

Let errm(hj) D{L(hj) AL(m)) be the error made by hypothesis hj with 
respect to monomial m. Here L(hj) AL(m) is the symmetric difference of L(hj) 
and L{m) , and D the probability distribution with respect to which the exam- 
ples are drawn, hj is an e -approximation for m if errm{hj) < e. Finally, we 
redefine the stage of convergence. Let d = ^ in-fo{L(rn)) , then 

CO'N^{d) = the least number j such that err m{P {d\i\y) < e for all i > j ■ 

Note that once the Wholist algorithm has reached an £ -approximate hypoth- 
esis all further hypotheses will also be at least that close to the target monomial. 
The following theorem gives an affirmative answer to the question posed above. 
Theorem 10. Let c = L(rn) G Cn be a nontrivial concept. Assuming that exam- 
ples are drawn independently from the uniform distribution, the expected number 
of examples needed by algorithm P until converging to an e -approximation for c 
can be bounded by if[CONE] < ^ • ([log 2 k{m)~\ -\- 3) . 

Thus, additional knowledge concerning the underlying probability distribu- 
tion pays off again. Using TheoremQand modifying Section Qmitiatis mutandis, 
we achieve stochastic finite learning with high confidence for all concepts in Cn 
using 0(i • log J- • logn) many examples. However, the resulting learner now in- 
fers £-approximations. Comparing this bound with the sample complexity given 
in the PAC model one notes an exponential reduction. 

Finally, we generalize the last results to the case that the data sequences 
are binomially distributed for some parameter p G (0, 1) . This means that any 
particular vector containing v times a 1 and n — v & Q has probability p'^(l — 
p)ri-v a 1 is drawn with probability p and a 0 with probability 1 — p. 

First, Theorem El generalizes as follows. 

Theorem 11. Let c = L(rn) G Cn be a nontrivial concept. Let m contain pre- 
cisely IT positive literals and r negative literals. If the labeled examples for c are 
independently binomially distributed with parameter p and if '■= 
then the expected number of examples needed by algorithm P until convergence 
can be bounded by if [CON] < k{m)~\ +3^ . 

Theorem ITUl directly translates into the setting of binomially distributed inputs. 
Theorem 12. Let c = L(m) G Cn be a nontrivial concept. Assume that the 
examples are drawn with respect to a binomial distribution with parameter p, 
and let if = minlj^, ^}. Then the expected number of examples needed by 
algorithm P until converging to an e -approximation for c can be bounded by 
if [CON] < i-([log,^fc(m)l+3) . 
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Finally, one can also learn £ -approximations stochastically finite with high 
confidence from informant with an exponentially smaller sample complexity. 
Theorem 13. Let 0 < piow < Puv < 1 V' For 

{Cn,'Dn[piowiPup\) s -approximations are stochastically finitely learnable with S- 
confidence from informant for all e,S G (0, 1) . Further, O • log 2 1/(5 • log,^ n) , 
resp. O (i • (log,^ 1/(5 -I- log,^ n)) many examples suffice for this purpose. 
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Abstract. Leo Harrington surprisingly constructed a machine which 
can learn any computable function / according to the following criterion 
(called Jic* -identification). His machine, on the successive graph points 
of /, outputs a corresponding infinite sequence of programs po,Pi,P 2 , • • •, 
and, for some i, the programs pi,pi+i,pi+ 2 , each compute a vari- 
ant of / which differs from / at only finitely many argument places. 

A machine with this property is called general purpose. The sequence 
Pi,Pi+i,Pi+ 2 , ... is called a final sequence. 

For Harrington’s general purpose machine, for distinct m and n, the 
finitely many argument places where pi+m fails to compute / can be 
very different from the finitely many argument places where pi+„ fails 
to compute /. One would hope though, that if Harrington’s machine, 
or an improvement thereof, inferred the program pi+m based on the 
data points /(O), /(I), • • . , f{k), thenpi+m would make very few mistakes 
computing / at the “near future” arguments k l,k 2, . . . ,k 
where i is reasonably large. Ideally, pi+^’s finitely many mistakes or 
anomalies would (mostly) occur at arguments x ^ k, i.e., ideally, its 
anomalies would be well placed beyond near future arguments. In the 
present paper, for general purpose learning machines, it is analyzed just 
how well or badly placed these anomalies may be with respect to near 
future arguments and what are the various tradeoffs. 

In particular, there is good news and bad. Bad news is that, for any 
learning machine M (including general purpose M), for all m, there ex- 
ist infinitely many computable functions / such that, infinitely often M 
incorrectly predicts /’s next m near future values. Good news is that, 
for a suitably clever general purpose learning machine M, for each com- 
putable /, for M on /, the density of any such associated bad prediction 
intervals of size m is vanishingly small. 

* This paper considerably extends and improves the previously unpublished Chapter 5 
of [CheSl], in particular answering all its open questions. 
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Considered too is the possibility of providing a general purpose learner 
which additionally learns some interesting classes with respect to much 
stricter criteria than Bc*-identification. Again there is good news and 
bad. The criterion of finite identification requires for success that a 
learner M on a function / output exactly one program which correctly 
computes /. -identification is just like Bc*-identification above ex- 
cept that the number of anomalies in each program of a final sequence is 
< n. Bad news is that there is a finitely identifiable class of computable 
functions C such that for no general purpose learner M and for no n, 
does M additionally Bc"-identify C. 'E'x.-identification by M on / re- 
quires that M on / converges, after a few output programs, to a single 
final program which computes /. A reliable learner (by definition) never 
deceives by false convergence; more precisely: whenever it converges to 
a final program on a function /, it must Ex-identify /. Good news is 
that, for any class C that can be reliably Ex-identified, there is a general 
purpose machine which additionally Ex-identifies C! 



1 Introduction 

The learning situation often studied in inductive inference |()SW8bl l,l()L{S98| 
may be described as follows. A learner receives as input, one at a time, the 
successive graph points of a function /. As the learner is receiving its input, 
it conjectures a sequence of programs as hypotheses. To be able to learn the 
function /, the sequence of programs conjectured by the learner must have some 
desirable relation to the input function /. By appropriately choosing this de- 
sirable relation one gets different criteria of successful learning. One of the first 
such criteria studied is called E,:x.-identification ( jOol67l IHH75I K 18781 ITTbb.'fj ). 
The learner is said to 'E:x.-identify a function f iff the sequence of programs out- 
put by it on /, after a few output programs, converges to a single final program 
which computes /0 A learner is said to Ex-jrfentf/y a class iff it Ex-identifies 
each function in the class. A class of functions is 'E:x.-identifiable iff some machine 
Ex-identifies the class. 

Even though one cannot Ex-identify the class of all the computable functions 
ran , there are large and useful classes of functions which can be Ex-identified. 
For example, any recursively enumerable class of computable functions such as 
the class of polynomials or the class of primitive recursive functions IkogbVal is 
Ex-identifiable. 

|Bar741 K1S78I I(1S88| considered a generalization of Ex-identification called 
Bc-identification. In Bc-identification of a function / by a machine M one re- 
quires that the sequence of programs output by M on / either converges to a 
program for /, or the sequence of programs is infinite, with all but finitely many 
of them being (possibly different) programs for /. also considered 

the variants of the Ex and Bc-identification criteria in which the final programs 
need not be perfect, but are allowed to have some anomalies or mistakes in their 
predictions of I/O behavior. For n a natural number, if the final programs are 



^ In general more formal definitions are in Section |21 below. 



426 



John Case, Keh-Jiann Chen, and Sanjay Jain 



allowed to make at most n errors, then the criteria of inference are called Ex" 
and Be" respectively. If the final programs are allowed to make at most finitely 
many errors, then the criteria of inference are called Ex* and Be* respectively. 

Harrington constructed a machine which Be*-identifies each com- 

putable function! In the present paper, we call machines which do this general 
purpose. However, on infinitely many computable functions, the final programs 
output by Harrington’s machine become more and more degenerate, i.e., the fi- 
nite sets of anomalies in successive final output programs, in general, grow in size 
without bound. We note that this is a property of any general purpose learner, 
and, in fact, the number of anomalies grows faster than any computable bound 
(Theorem 0in Section 0 below). 

Since the programs output by any general purpose learning machine make 
large numbers of mistakes (on infinitely many computable functions), it would 
be interesting to study how these errors are distributed. For example, in real 
life one probably cares more about “near future errors” than “distant future 
errors” . Based on this motivation in Section 0 below we define new criteria of 
inference called Bc((j. Informally, for a machine to Bc^-identify a function /, for 
its final programs, their predictions on the next m inputs should have at most n 
errors. In Section 2]we completely resolve the relationship between different Be)), 
criteria of inference (Corollary 0 in Section 0). In particular, we show that for 
any learning machine M, (including general purpose M), for all m, there exist 
infinitely many computable functions / such that, infinitely often M incorrectly 
predicts /’s next m near future values (Corollary 2j! Thus there is an ostensibly 
unpleasant cost to general purpose learning. As we will see, though, this can, be 
assuaged at least in some interesting respects described below. 

In contrast to the result mentioned above that any general purpose learning 
machine M predicts next m values wrongly infinitely often, we show that the 
density of such bad prediction intervals can be made very small (Theorem El in 
Section El below). 

A reliable learner (by definition) never deceives by false convergence; more 
precisely: whenever it converges to a final program on a function /, it must 
Ex-identify / |Min76L [BB75LimjNAI94| . For example, r.e. classes of computable 
functions (such as the class of polynomial functions and the class of primitive 
recursive functions |Kog67a] ) as well as the class of total run time functions 
can be reliably Ex-identified EEZSlESHni- On a further positive note, we show 
that for every reliably Ex-identifiable class of computable functions S, there is a 
general purpose learning machine which Ex-identifies S (Theorem 0in Section El 
below) ! 

The criterion of finite identification requires for success that a learner M on a 
function / output exactly one program which correctly computes /. Learning by 
finite identification can be thought of as one-shot learning. We show, by contrast 
to the result in the immediately above paragraph (Theorem^, that there is a 
class S which is finitely identifiable, yet for all n, no general purpose learner can 
additionally Bc"-identify S (Corollary El in Section El below) . 

We now proceed formally. 
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2 Notation and Preliminaries 

Recursion-theoretic concepts not explained below are treated in |Rog 67 b| . N 
denotes the set of natural numbers. * denotes a non- member of N and is assumed 
to satisfy (Vn)[n < * < oo]. Let G, C, c, 3 , D, respectively denote membership, 
subset, proper subset, superset and proper superset relations for sets. 0 denotes 
the emptyset. card(S') denotes the cardinality of set S. So “card(S') < *” means 
that card(S') is finite. min(S') and max(S'), respectively, denote the minimum 
and maximum element in S. We take min( 0 ) to be oo and max( 0 ) to be 0 . 

(•, •) denotes a 1-1 computable mapping from pairs of natural numbers onto 
natural numbers. are the corresponding projection functions. (•,•) is ex- 

tended to n-tuples in a natural way. 

A denotes the empty function. 77, with or without decorations, ranges over 
partial functions. r]{x)l denotes that t]{x) is defined. ?7(a;)t denotes that r]{x) is 
not defined. For a G N Li {*}, 771 =“ 772 means that card({x | rji{x) fy 772(0:)}) < 
a. 771 772 means that ^[771 =“ 772]. (If 771 and 772 are both undefined on input 

a;, then, as is standard, we take 771(2:) = 772(0:).) If 77 =“ /, then we often call a 
program for 77 as an a-error program for /. domain( 77 ) and range( 77 ) respectively 
denote the domain and range of the partial function 77. 

f,g and h, with or without decorations, range over total functions. TZ de- 
notes the class of all computable functions, i.e., total computable functions with 
arguments and values from N. C and S, with or without decorations, range over 
subsets of TZ. ip denotes a fixed standard or so-called acceptable programming 
system |Kog 58 [|Rog 67 a|IIlic 80 LinTc 81 L |Roy^ . Pi denotes the partial computable 
function computed by program i in the t/j-system. Note that in this paper all 
programs are interpreted with respect to the (/^-system. 

2.1 Function Identification 

We first describe inductive inference machines. We assume, without loss of gen- 
erality, that the graph of a function is fed to a machine in canonical order. For 
any partial function 77 and n G N such that, for all x < n, r]{x)l, we let 77(77] 
denote the finite initial segment {(a:, 77(0:)) | x < n}. Clearly, 77(0] denotes the 
empty segment. We let A denote the empty segment. SEG denotes the set of all 
finite initial segments, {f[n] \ f G TZ A n G N}. We let cr and r, with or without 
decorations, range over SEG. Let jcrj denote the length of a. We often identify 
(partial) functions with their graphs. Thus for example, for cr = f[n] and for 
X < n, u{x) denotes f{x). A learning machine (also called an inductive inference 
machine (IIM)) |Gol 67 | is an algorithmic device that computes a mapping from 
SEG into N U {?}. Intuitively, “?” above denotes the case when the machine 
may not wish to make a conjecture. Although it is not necessary to consider 
learners that issue “?” for identification in the limit, it becomes useful when 
the number of mind changes a learner can make is bounded. In this paper, we 
assume, without loss of generality, that once an IIM has issued a conjecture on 
some initial segment of a function, it outputs a conjecture on all extensions of 
that initial segment. This is without loss of generality because a machine wishing 
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to emit “?” after making a conjecture can instead be thought of as repeating its 
previous conjecture. We let M, with or without decorations, range over learning 
machines. Since the set of all finite initial segments, SEG, can be coded onto N, 
we can view these machines as taking natural numbers as input and emitting 
natural numbers or ?’s as output. We say that M(/) converges to i (written: 
M(/)| = i) iff (V°“n)[M(/[n]) = f]; M(/) is undefined if no such i exists. The 
next definitions describe several criteria of function identification. 

Definition 1. IGoltiTl IBfj75l ICSHifl Let a € N U {*}. Let f €TZ. 

(a) M -identifies f (written: / G Ex“(M)) just in case, there exists an i 
such that M(/)| = i and Lpi =“ /. 

(b) M Ex°' -identifies 5 iff M Ex“-identifies each f G S. 

(c) Ex“ = {5 C I (3M)[5 C Ex“(M)]}. 

We often write Ex for Ex°. 

By definition of convergence, only finitely many data points from a function / 
had been observed by an IIM M at the (unknown) point of convergence. Hence, 
some form of learning must take place in order for M to learn /. For this reason, 
hereafter the terms identify, learn and infer are used interchangeably. 

Definition 2. jUar74l l( ;S8,*f) Let a G N \J {*}. Let f GTZ. 

(a) M -identifies f (written: / G Bc“(M)) iff, for all but finitely many 

nG N, =“ /■ 

(b) M -identifies 5 iff M Bc“-identifies each f G S. 

(c) Bc“ = {5 C I (3M)[5 C Bc“(M)]}. 

We often write Be for Bc°. 

Some relationships between the above criteria are summarized in the follow- 
ing theorem. 

Theorem 1. jCS88l IBB 751 IBar71| 

Ex° C Ex^ C • • • C Ex* C Be C Bc^ C • • • C Be* = 2^. 

Since TZ G Be*, we often call a machine which Be*-identifies TZ a general 
purpose learning machine. 

We let I range over identification criteria defined in this paper. There exists 
an r.e. sequence Mq, Mi, M 2 , . . ., of inductive inference machines such that, for 
all criteria I of inference considered in this paper, one can show that [( )SW8h1 

for all C G I, there exists a,n i G N such that C C I(M,). 

We assume Mq, Mi, M 2 , ... to be one such sequence of machines. 

3 General Purpose Machines and Their Mistakes 

Unfortunately, the programs output by Harrington’s machine become more and 
more degenerate, i.e. the finite set of anomalies in final programs output grows in 
size without bound. In fact the finite sets of anomalies cannot even be bounded 
by a computable function as the next theorem shows. 
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Theorem 2. Suppose TZ C Bc*(M). Let g he a computable function. Then there 
exist infinitely many f GTZ such that, for infinitely many n, ipM.{f[n]) /• 



4 Predicting Near Future Values 



Based on Theorem 0 it would be interesting to study how the anomalies of the 
programs outputted by a machine are distributed. For example, in real life one 
probably cares more about “near future errors” than “distant future errors”. 
This leads us to the following definition. 

Definition 3. M identifies f (written: / € Bc((j(M)), iff, for all but 

finitely many x, card({z < m | f{^ + -2^)}) ^ 

M IBc'f,- identifies C, if it Bc((j-identifies each f G C. 

Bc((j = {C I some M Bc((j-identifies C}. 

Intuitively, one can view Bc((j-identification of a function by a machine as 
follows. At any stage, the learning machine predicts the next m values. At all 
but finitely many stages, at least m — n out of the m predictions are correct. 

In this section we resolve the relationship between different Bc((j-identification 
criteria. 

Following four propositions follow directly from the definitions. 

Proposition 1. For m> n, Be" C Bc((j. 

Proposition 2. Suppose m>n and k G N. Then Bc((j C 
Proposition 3. Suppose m > n > k. Then Bc^ C Bc((j. 



Proposition 4. Suppose m > k > n. Then Bc((j C Bc^. 

NV" defined by Podnieks IPod74l , is same as Bc°. The following proposition 
can thus be proved by using the equality Be = NV". 

Proposition 5. For all m > 0, Be = Be^. 

The following theorem shows some advantages of having to predict fewer 
correct values in the near future. 

Theorem 3. Suppose m' > m. Then, BeJ„ — Be™,”"* 0. 

Proof. Let 

Zk = {x \ k ■ m' < X < k ■ m' + m}, 

Ek = {k ■ m'} U {x \ k ■ m' + m < X < {k + 1) ■ m'}, 



and 



Uk = Zk U Ek = {x \ k ■ m' < X < {k + 1) ■ m'}. 
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We now consider the following two properties defined on total functions. 

(PropA) / satisfies PropA iff, for all fc, for all x G Zj., f{x) = 0. 

(PropB) / satisfies PropB iff for all k, for all x G Ek, f{x) = f{k ■ m'). 

Let C = {f GTZ \ f satisfies PropA and PropB}. 

The above class is easily seen to be in Bc}^. Proof of C ^ uses a 

complicated diagonalization argument and is omitted due to lack of space. | 

The next theorem shows some advantages of being allowed to predict more 
wrong values in the near future. 

Theorem 4. Bc"+^ - Bc^J+i ^ 0. 

Proof. Let Zf = {x \ f{{n + 2) ■ x) 0}. 

Let C = {/ I [card(Z/) = oo A (V~a; G Zf)[ipf(^(^^+ 2 )-x) /]] V [0 < 

Card(^/) < oo A <<5/((„+2).max(Zj)) /li- 

lt is easy to verify that C G Bc""''^. A diagonalization argument can be used 
to show that C ^ Bc"_,_j^. We omit the details due to lack of space. | 

As a corollary to the above theorem, if one looks at the errors committed 
by a general purpose machine on the next n inputs, then for infinitely many 
functions, at infinitely many positions, the machine commits n errors in predict- 
ing the next n inputs. Hence, there is an ostensibly unpleasant cost to general 
purpose learning. However, as we shall see, this can be assuaged at least in some 
interesting respects (Theorems El and 0 in Section El below) . 

Corollary 1. Suppose m > n and m' > n' . If n > n' , then Be)} — Bc()j, yf 0. 

Corollary 2. Suppose m' > n! and m > n > 0. If m' — n' > m — n, then 
Bc^ - Bci, yf 0. 

Proof. If n' < n, then corollary follows from Corollary 0. So suppose n < n' . 
Theorem0shows that Bc),^_^_,_^ — y^ 0 (note that m' > m — n+ 1). 

Now, Bcl^^~^ D Bc)„_jj_|_]^ (by Proposition 0, and Bc’f^, C (since 

n' < m' — m + n — 1, and by Proposition 0 . Corollary follows. | 

The following corollary resolves all relationships among the Bc(}-criteria. 

Corollary 3. Suppose m > n and m' > n' . Then: BEf^ C BC, iff [ n = 0 or 
[n' > n and m! — n' < m — n]] . 

Proof. If n > n' then Corollary 0 shows that Bcf^ — BEf^, y^ 0. limf — n' < m — n 
and n > 0, then Corollary 0 shows that BEf^ — Bc(}, y^ 0. 

If n = 0, then BEf^ = Be C Bc(}',. 

If n' > n and m' — n' < m — n, then BEf^ C (by Proposition 0) 

and Bc()j_,_„,_„ C Bc(}, (by Proposition^. Corollary follows. | 
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Corollary 4. For all m > n, TZ ^ 

Thus, no general purpose learning machine can guarantee that anomalies are 
not concentrated in the near future. 



5 Desirable Properties Achievable by General Purpose 
Learners 

Since the errors committed by programs output by a general purpose learner 
can be arbitrarily bad, we look at how this may be assuaged for suitable general 
purpose learners, and we also determine some additional nice properties a general 
purpose learner can satisfy. 

On can think of a program for a computable function as a predictive ex- 
planation for the function’s I/O behavior 11313751 IOS83l . Popper’s Refutability 
Principle IPOTI essentially says that explanations with mistakes should be 
refutable. As pointed out in IQb83l (see also IOJ1NM94I '). an erroneous predictive 
explanation (program) for a computable function satisfies Popper’s Principle if 
it computes a total functionll The following theorem says that one can construct 
a general purpose learner which, on computable function input, almost always 
outputs programs for total functions; hence, it almost always outputs predictive 
explanations which satisfy Popper’s Principle. 

Theorem 5. There exists a machine M, such that, for all f GTZ, (i) M. Bc*- 
identifies f, and (ii) (V°°n)[(/3]y[(/[n]) G ”^]- 

The following shows that even though a general purpose machine may be 
locally bad for infinitely many positions, one can ensure that these bad positions 
have low density. 

Theorem 6. For all n, there exists a machine M such that, 

/a/ M ~Bc* -identifies TZ, and 

(h) for all / G lim.^oo a (Vz<nH^M(/w)(fc+^)=/(fc+^)l}) = 

Since general purpose learners are always quite erroneous (of course the den- 
sity of erroneous, near future intervals can be made small), it is interesting to 
consider which classes a general purpose learner may additionally identify in a 
better or stricter sense. 

Definition 4. |Mm76l EB7^ ICJ^M94| M is said to be reliable iff, for all / 
such that M(/)|, M Ex-identifies /. 

M is said to reliably Ex-identify C, iff M is reliable and M Ex-identifies C. 
RelEx = {C I some machine reliably Ex-identifies C}. 



^ Then the halting problem IRogh’Tal does not stand in the way of algorithmically 
locating the mistakes. 
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Intuitively, reliable machines do not deceive us by converging falsely. As noted 
above, r.e. classes of computable functions (such as the class of polynomial func- 
tions and the class of primitive recursive functions) as well as the class of total 
run time functions can be reliably Ex-identified. 

The following theorem shows that for any class S in RelEx, one can create 
a general purpose learning machine which Ex-identifies 5! 

Theorem 7. Suppose S G RelEx. Then there exists an M such that M Bc*- 
identifies TZ and 'Ex-identifies S. 

On the other hand, the following corollary to the proof of Theorem El shows 
that RelEx cannot be replaced by finite identification jCS8,3| . 

Let So = {f \ v5y(o) = /}. 

Proof of Theorem 13 can be used to show the following: 

Corollary 5. Suppose n G N and M Ec* -identifies TZ. Then there exists an 
f G So such that M does not Ec^ -identify f. 

It would be interesting to study what other useful properties a suitable gen- 
eral purpose learner can be made to satisfy. 

6 Conclusions 

Harrington USHal surprisingly constructed a general purpose learner, i.e., a ma- 
chine which Bc*-identifies all the computable functions. However, the programs 
output by Harrington’s machine become more and more degenerate, i.e., in gen- 
eral, the finite set of anomalies in each final program grows without bound. In 
this paper we showed that this is unavoidable (Theorem E| above). 

Since the programs output by any general purpose learning machine make 
large number of errors on infinitely many functions, it is interesting to study 
how these errors are or can be distributed. Based on this motivation we defined 
new criteria of inference called Ec^, and completely resolved the relationship 
between different Ec^ criteria of inference. Among other results, we showed that 
any general purpose learning machine is poor in predicting near future values. In 
particular any general purpose learning machine M predicts the next n values 
wrongly infinitely often. In contrast, though, we show that the density of such 
bad prediction points can be made vanishingly small (Theorem El above). 

We constructed a general purpose learning machine M such that, on any 
computable function input, all but finitely many of the programs output by 
M are for total functions. Hence, almost all of its conjectures satisfy Popper’s 
Refutability Principle. 

We also showed that for every class of computable functions, 5, which can 
be Ex-identified by a reliable machine |Min7fiirR^ICJJNM94| (see definition 
in Section 0 above), some general purpose learning machine additionally Ex- 
identifies S. We further show, though, that reliable identification in the just 
above statement cannot be replaced by finite identification. 

It would be interesting to study which other useful properties a general pur- 
pose learner can or cannot have. 
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Abstract. The equivalence of the universal distribution, the a priori 
probability and the negative exponential of Kolmogorov complexity is a 
well known result. The natural analogs of Kolmogorov complexity and 
of a priori probability in the time-bounded setting are not efficiently 
computable under reasonable assumptions. In contrast, it is known that 
for every polynomial p, distributions universal for the class of p-time 
computable distributions can be computed in polynomial time. We show 
that in the time-bounded setting the universal distribution gives rise to 
sensible notions of Kolmogorov complexity and of a priori probability. 

1 Introduction 

An universal distribution can be considered as representing the whole class of dis- 
tributions, since for any distribution, the probability of an instance x is bounded 
(up to some constant factor) by the probability of x with respect to the univer- 
sal distribution. To make this more precise consider the following two examples. 
In [7], Li and Vitanyi show that any concept class that is PAC-learnable with 
respect to the universal distribution, is PAC-learnable with respect to any sim- 
ple (i.e. r.e.) distribution provided the examples (in the learning phase) are 
given with respect to the universal distribution. A similar result holds if we 
consider universal distributions and PAC-learning with respect to the class of 
t-computable distributions. 

As a second example observe that the average-case complexity (of any algo- 
rithm) with respect to the universal distribution gives an upper bound on the 
average-case complexity with respect to any distribution of the class. In fact 0 
(see also jS|), in the unbounded case, worst-case complexity and average-case 
complexity with respect to the universal distribution coincide. (As an average- 
case complexity one might consider the expected value w.r.t. the conditional 
probability restricted to strings of length n) . 

In the unbounded case, an enumeration /ri,/i 2 , • • ■ of the r.e. semimeasures 
can be used to define an universal distribution (semimeasure) m (L. A. Levin 
cf. [Zl). For all x G E* let 



where a is some function such that a{n) < 1, e.g., a{n) = l/n{n+ 1). How- 
ever, an universal distribution can be defined equivalently using the algorithmic 
or the a priori probability on the strings [b]. 




( 1 ) 



n 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 434-^5^] 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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Let U he & universal prefix Turing machine. (M is a prefix Turing machine 
if for all X and y, if M{x) stops then M on input xy stops without reading 
any bit of y). The a priori probability Qu is defined by Qu{x) = 
where p ranges over all strings such that U{p) = x but U{p') does not stop for 
any prefix p' of p. The algorithmic probability R of an instance x is defined 
by R{x) = where K{x) is the prefix Kolmogorov complexity of x, i.e., 

K{x) = mini{3p € : U{p) halts with output a;}. 

Theorem 1. (Levin cf. w For some constant c and all strings x, — logm(x) = 

— logQ[/(2;) = — logi?(a;) = K{x) up to an additive constant less than c. 

Recently, it has been shown that for some polynomial p and any nondecreas- 
ing time-constructible function t, there exists an enumeration pi, /T 2 , • • • of p{t)~ 
computable semimeasures containing all f-computable semimeasures nm. Hence 
using Equation m, a semimeasure m* universal for the class of t-computable 
distributions is computable in time polynomial in t. In this paper we consider 
the question of whether equivalent definitions in terms of time-bounded a priori 
and time-bounded algorithmic probability exist. 

First we note in Proposition^that the time-bounded prefix Kolmogorov com- 
plexity of a string x is not computable in time polynomial in the time-bound 
unless no polynomial-time computable pseudo-random generator exist (for defi- 
nitions see next section) . We then propose a restriction on Turing machines and 
show that for this model universal Turing machines exist. For a function t let Qjjt 
and Rt denote the time bounded version of a priori and algorithmic probability 
(using this machine model). Then we can show the following theorem. 

Theorem 2. Let t be a nondecreasing time-constructible function. There is a 
constant c, polynomials pi, p 2 , and p^ such that for all strings x, — log mt(x) < 

— log (a^) < ~ log ~ logm.p 3 (t)(a;) up to an additive constant 

less than c. 

In this paper we follow the notation and definitions given in in particular 
Chapter 4 and Section 7.6. 

2 Preliminaries 

We consider distributions on the sample space E*, where E = {0,1}. More 
formally, a semimeasure is defined by a function /i from E* to the real interval 
[0, 1] such that J2xes* E 1- If the sum converges to 1 then y. defines a 
distribution (or probability measure). The function y is called density function, 
the distribution function y* is defined by y*{x) = 'Yhy<x l{u) for all x. 

A distribution (semimeasure) y dominates a distribution if c • y{x) > v{x) 
for some constant c and all x G E*. Let IF be a class of distributions and hhe a, 
function from E* — > IN. A distribution y is h-universal for T ii h ■ y dominates 
every distribution v in T . If /i is a constant then y is universal for T . 

We will be considering two classes of distributions. A distribution y is called 
simple 0 if it is dominated by a recursive enumerable semimeasure v. (That 
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is, the set {{x,p,q) \ p/q < i^(a;)} is r.e.). Let t be a nondecreasing time- 
constructible function and p he a, distribution, p is called t-computahle if the 
distribution function p* is computable in time t, i.e., p*{x) = f{x)/g{x) for 
some functions /, g computable in time t. p is polynomial-time computable if it is 
p-time computable for some polynomial p. Note that, following [51411 1 . we require 
that the distribution p* function is t-computable (instead of the density func- 
tion p). In fact, if p* is polynomial-time computable then p is polynomial-time 
computable, the converse is not true unless V = NV @|. 

We will be using three different notion of ordering of strings. A string x 
is lexicographically smaller than a string p, denoted by a; < y, if |a;| < |p| or 
|a;| = |y| and zO is a prefix of x and zl is a prefix of y for some z. For a string 
X = Xk-i . . -Xq, Xi € {0, 1}, let O.x denote the rational number X)i<fc-i ’ 2*- 
Then the value of x is smaller than the value of p if O.x < O.p. For a string x let 
Left(x) be defined as follows. 

Left(a;) = {p | 3^ : zO is a prefix of x and zl is a prefix of u} 

E.g., 010 is not in Left(OlOl) but 0100 is in Left(OlOl) . 

2.1 Time-Bounded Prefix Complexity 

First let us note that a universal distribution defined using the time-bounded 
version of the prefix Kolmogorov complexity is not likely to be computable in 
time polynomial in the time bound. Let t be a nondecreasing time-constructible 
function. The t-time bounded prefix Kolmogorov complexity K*(x) of a string x 
is defined as follows. Let U he a universal prefix Turing machine, then 

K*(a;) = min{3p S A* : U{p) halts with output x within < t(|a;|) steps}. 

The semimeasure mp) is defined as follows [Z|. For all x, = 2“^* 

where t'{n) = t{n) ■ logt(n) for all n. Recall that for all p and p', if U{p) stops 
and p is a prefix of p' then U{p) = U{p'). Hence it follows from Konig’s Lemma 
that where p ranges over all strings such that U{p) 

stops in time t{\U{p)\) but U{p') does not stop for any prefix p' of p. Therefore, 
defines indeed a semimeasure. 

Theorem 3. Q/ Let t, t(n) > n, he a nondecreasing time-constructible function. 
The distribution is universal for the class of t-computable distributions. 

From the definition of the time-bounded prefix Kolmogorov complexity it 
follows that is computable in time 0(nt(n)2") and hence is computable 
in time 0(nt(n)2^”). It is not difficult to see that under the assumption that 
TV = ffV, is computable in time polynomial in n and t. However, without 
any (unproven) assumption, it is not known whether is computable in time 
polynomial in n and t |2j. The next proposition shows that this is unlikely, 
since it is generally believed that polynomial-time computable pseudo random 
generators exist. 

A function G : A" ^ is a pseudo random generator Pj, if for every set 
AgV and infinitely many n \\Vr^^x:‘^n[A{x) = 1] —Vixes^lAiG^x)) = 1]|| < 
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Proposition 1. If for every polynomial t the distribution rni^p^ is dominated by 
a polynomial-time computable distribution, then no polynomial-time computable 
pseudo random generator exists. 

Proof. Let G : be polynomial-time computable. Choose a constant k 

such that G is computable in time O(n^). Let t{n) = for all n, and assume 
that a polynomial-time computable distribution p, dominates . Choose a 

constant c such that c • p{x) > rri(t){x) = 2~^ for all x, where t' {n) = 
t{n) ■ logt(n) for all n. This shows that for all x, K‘ (re) > —logp{x) — logc. 

Let A = {x \ p{x) < Observe that A G V and for every strings 

X in A, K* (x) > —logp(x) — logc > (2/3)|rz;| — logc. On the other hand, the 
strings generated by G have small K‘ complexity which can be seen as follows. 
If X is the output of G on input y for some y, then x can be generated by U 
from a self-delimiting description of G followed by a self-delimiting description 
of y. Since U can be simulated in time t{n) = there exists a constant d 
independent of rc, such that K‘ (rr) < \y\ -\- 2 log \y\ -\- d = |rr|/2 -|- log |rc| -I- d. 
Hence, there exists a constant no such that G(y) ^ A for all y, \y\ > uq. 

Since |x|=nM(2^) ^ if follows that ||H="'|| > 2” — 2^”/^. (Assume oth- 
erwise, then \x\=nl^(^) — > 2^”/^ • 2“^”/^ = 1). Hence for all 

n > no, ||Pra,6i;n[A(:r) = l]-Pr„,g^„/ 2 [A(G(rr)) = 1]|| > > l-2""/3. 

Therefore G is not a pseudo random generator. 



2.2 Shannon-Fano Codes 

A semimeasure p defines a prefix code E, called Shannon-Fano code, on strings 
as follows pj. Here and in the following we assume w.l.o.g. that p{x) > 0 for all 
X. For example, let p be any distribution and let pst be defined by /ist(A) = 1/2 
and pst(x) = (2|a:|(|a;| -I- l)2l“l)“^ for all x Then n^x) = (p{x) -\- pst(x))/2 
dominates p and u(x) > 0 for all x ^ X. 

Consider the interval [0, 1) partitioned into disjoint half open intervals I{x) 
of size p{x) as follows. I{x) = [p*{x~), p*{x)), where x~ is the predecessor of x. 
Recall that p*{x) = Yy<x i-®-i i^i^) = ~ Let p be a string. 

Then [O.p, 0.p-|- 2“l*’l] is the binary interval defined by p. Recall that we use O.p 
to denote the dyadic rational number defined by Pk-i ■ 2^“^ -I- • ■ ■ + po • 2°, where 
p = Pk-i • • -PO) Pi € {0) !}• Let / be any interval and I be the size of a largest 
binary interval contained in I. Then / is covered by at most 4 intervals of size 1. 

The Shannon-Fano code of x, E{x), is the lexicographically first string p 
which defines a largest binary interval which is contained in I(x). The code E 
has the following properties: 

— if is a prefix code since for all p', if p is a prefix of p' then the binary interval 
defined by p' is contained in the binary interval defined by p. 

— if is ordered, i.e., for all x and x', A x < x' then E{x) G Left(if(j;')). 

— for all X, —log(p(x)) < |if(a:)| < — log(p(a;)) -I- 2, since p(x) < 4 • 
and 2-l®(^)l < p(x). 
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Let the inverse of the Shannon-Fano code E induced by /r, denoted by D, be 
defined as follows. For allp, D{p) = x, if E{x) is a prefix of p or (E{x~) G Left(p) 
and p G Left(if(a;))), and D{p) is undefined otherwise. That is, D{p) = x for any 
string p such that the binary interval defined by p is contained in the interval 
{E{x-),E{x)\. 

Theorem 4. E Assume that p is t-computahle. Then for some polynomial p, 
the Shannon-Fano code E and the inverse D of the Shannon-Fano code induced 
by pL are computable in time p(t). 

Obviously, if p is universal for the class of t-computable distributions then 
the Shannon-Fano code E^ induced by p is optimal in the following sense. For 
every t-computable distribution v there exists a constant c such that for all x, 
\Efi{x)\ < \E^{x)\ -\- c, where Ey is the Shannon-Fano code induced by v. In 
the following we discuss the properties of a Turing machine which, given E{x), 
computes x = D(E(x)) and show that universal Turing machines exist. 

3 Ordered Time-Bounded Prefix Complexity 

In this section we propose a stronger version of time bounded Kolmogorov com- 
plexity and show that universal Turing machines exist in this model. 

Recall the definition of the left set of a string u: Left(u) = {v \ 3z : 
zO is a prefix of v and zl is a prefix of u}. The relation (defined by) Left is a 
partial ordering on strings. Furthermore, for any set of prefix free strings. Left 
is a total ordering (i.e., if u is not a prefix of v and v is not a prefix of u, then 
u G Left(u) or u G Left(?x)). 

Let t be a nondecreasing time-constructible function from E* IN and M 
be a prefix Turing machine. M is called t-time nondecreasing if for all u and v, 

1. if M{u) stops then the number of steps of M{u) is bounded by t{\M{u)\). 

2. if M{u) stops, V G Left(rt), and |u| > t(|u|) then M{v) stops. 

3. if M{u) and M{v) stop and v G Left(M) then M{v) < M{u) 

Let M he & f-time nondecreasing prefix Turing machine. The nondecreasing pre- 
fix Kolmogorov complexity (Knd-complexity) of a string x w.r.t. M, in symbols 
KndM(a;), is defined as follows. For all x, 

KndM(a:) = min{3u G : M{u) halts with output x}. 

Before we proceed to consider the existence of a universal Turing machine, we 
show that any string x has linear-time Knd-complexity of length |a;| -I- 2 log |a;| -I- c 
for some constant c. 

Proposition 2. 1. There exists a constant c and a linear-time nondecreasing 

prefix Turing machine M such that KndM(a;) < |a;| -I- 21og|a;| -I- c for all 
X ^ X. 

2. Let c be a constant and t, t(ji) > n, be a nondecreasing time-constructible 
function. There exists a constant d, a polynomial p and a p{t)-time nonde- 
creasing prefix Turing machine N such that for all x, if K*'{x) < c • log |a;| 
then KndAr(x) < dlog |a;|. 
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Proof. (1.) Consider the following coding of strings. For all a: yf A let x = 
l'°sl®l0s(|a:|)a;, where s(|a;|) = Zi G {0,1}, such that Zn-i = 1 and 

|a;| = -I- • • • -I- zq 2°. First we show that the coding is increasing. Given 

y < X, we distinguish three cases. 

Assume that log |y| < log \x\. Then 

y= l'°8l!/l0s(|y|)?/ G Left(l^°sl2^ll) C Left(l'°s 1^1) c Left(x). 

Assume that log \y\ = log \x\ and s(|y|) < s(|a;|). Then 

y G Left(l^°sl*^l0s(|y| + 1)) C Left(l^°s I"=l0s(|x|)) C Left(i). 

Assume that log \y\ = log \x\ and s(|y|) = s(|a;|) and y < x. Then 
y G Left(l'°® I*^l0s(|y|)a:) = Left(l'°s I"^l0s(|a;|)a;) = Left(i). 

It remains to show that there exists a linear-time nondecreasing Turing ma- 
chine M computing x from x. M can be defined as follows. 

On input y, if y G 1* then output “?”. Otherwise, choose k and z such 
that y is a prefix of l^Oz. If |z| < fc -|- 2^“^ then output “?”. Let z = 
bk-i ■ ■ ■ boXo ■ ■ ■ Xm- If Zk-i = 0 then output 0^ . Otherwise, let n = 

6i2*. If n > m output xq - ■ ■ Xn otherwise output “?”. 

(2.) Let K[c ■ logn, t] denote the set of all strings x such that K‘(a;) < c • 
log \x\. Given m, the elements in K\c- log n, f\ H A-™ can be enumerated in time 
polynomial in m and t. Therefore, there exists a polynomial-time nondecreasing 
prefix machine N which on input x outputs the xth string in K[c- logn, t]. 

In the following we show that for the Knd-complexity there is a reasonable notion 
of a universal Turing machine. Furthermore, (in contrast to what we know about 
time-bounded Kolmogorov complexity) given a string x it is possible to efficiently 
compute its Knd-complexity. 

In general it is not decidable whether a given Turing machine is (t-time) 
nondecreasing, however it is possible to augment the computation of a prefix 
Turing machine to ensure that the machine is t time nondecreasing. 

Lemma 1. Let t, t{n) > n, be a nondecreasing time-constructible function. 
There exists a prefix Turing machine N such that 

1. for every fixed string e, N{e,p) is 0(ji^t(ji)^ log t(n)) time nondecreasing, 

2. for every t-time nondecreasing Turing machine M, there exists an index e 
such that for all p, if M{p) stops then N{e,p) = M{p). 

Proof. Let Mi, M 2 , ... be an effective enumeration of all prefix Turing machines. 
Gonsider a machine Minv defined as follows. 

Minv{M, x) 
let n = \x\ 

if M(0‘^"^) does not stop within t{n) steps then output A 

V = X 

for z = 1 to t{n) do 

if M(z;10*^*^“*) does not stop within t{n) steps then z; = uO 
else if M(z;10*^"^“*) > x then v = vO else v = ul. 



output V 
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Assume that Me is t-time nondecreasing. In this case, Mmu(Me,a;) computes 
(using binary search) a string (of length t(|a:|)) such that Me{vx) < x and 
Me{u) > X for all u such that Vx G Left(u) and Me{u) stops. 

Now let Me be an arbitrary prefix machine such that Me(O^) stops for some 
k. Then for all x, \x\ > k, Minv{Me,x) is a string of length t(|a:|) such that 
Me{x) stops. Furthermore, for any y, if \y\ = |a;| and x <y then ls/Linv{Me,x) is 
equal to Minv{Me,y) or Minv{Me,x) is in Left{Minv{Me,y)). 

Using the procedure Minv, we define a “inverse” of Me in two steps as 
follows. In the first step, for ^ = I, 2, • • • , |a;| a string vp is computed such that 
Vii is the maximal string v of length t{l) such that Me{v) halts and Meiy) < 1*. 
If Me is not t-time nondecreasing, then we make sure that for all I, Vii is a prefix 
of Vii+i or Vii G Left(uii+i). 

input X, |x| = n 

for I = I to I = n let vp =Minv{Me, 

for I = 1 to ^ = n — 1 if u/ ^ Left(u;+i) then vi+i = 

In the second part, we use binary search (within to find a string Vx 

such that Me{vx) < x and x < Me{u) for all u such that Vx G Left(u) and Me{u) 
stops. 

Let VQn = Mmv(Me, 0”). If Vin-i ^ Left(uo-) then uq- = Uin-i 
For all I < i < n let 

iiMinv{Me,xi- ■ ■ XiO"'-'^) < Vx^...xi_iO'^-'+^ 

if Mm?;(Me,a;i---XiO"“*) > Vx^...xi_ii^-i+^ 
Minv{Me,xi ■ ■ ■ Xi0'^~'^), otherwise. 

{ Vxi-XiO'^-', HMinv{Me,Xi---X^l'^~^) < 

Vxi-xi_ii^-i+^, iiMinv{Me,xi- ■ ■ Xil^~'^) > Vxi...xi_ii^-^+^ 
Mmt'(Me, xi • • • Xil"”*), otherwise. 

Note that by definition uo*» < M™{x) < vin. Furthermore, for all k < n, 
Vxj^...xkO"-'‘ Vxi---xki"-'‘ i® defined such that Vxj^...xkO"~'= (’^xi---xki"-'=) i® 

maximal string v of length t(n) such that Meiy) halts and Me{v) < xi ■ ■ ■ Xfc0"“^ 
(Me(u) < xi • • • Furthermore, even if Me is not t-time nondecreasing 

we make sure that Vx^...x^o"->‘ ^ Xx^.-.x^i^-’^i for fol k < n. 

The Turing machine N : E* x S* ^ E* can now be defined as follows. On 
input (e,u) compute a string x such that M“’'(x“) G Left(u) but M“*'(a:) ^ 
Left(u). If u G Left (Mg™ (x)) or M™{x) is a prefix of v then output x. Again, 
assume that Me is t-time nondecreasing. In this case, if Mg stops on input u and 
if u is a prefix of u then Me{u) = x. 

input (e, v) 
m = 0 

while M“*'(l™) G Left(u) do m = m + 1 

if M*™(0™) ^ Left(u) then y = 

else 



Xxi---XiO"-‘ — 



Xxi---Xil"-‘ — 
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y = X 

for ^ = 1 to m — 1 

if ^ Left(t;) then y = yO else y = yl 

let X be the successor of y 

if u e Left(Mg"’'(a;)) or is a prefix of v then output x 

It remains to verify the time complexity of N. Observe that for every e, M™” 
is computable in time 0{nt{n)^). Since M*™ is simulated 0(n) times (where n 
is the length of the output) on strings of size at most n, the computation of N 
is 0(n^f(n)^ logt(n)) time bounded. 

A Turing machine U is called t-time nondecreasing universal, if U is non- 
decreasing t'-time bounded for a function (which should be not much larger 
than t) and for every nondecreasing t-time bounded Turing machine M there 
is a constant c such that Kndc/(a;) < KndM(a;) + c for all x. We show that t- 
time nondecreasing universal Turing machines exist. Furthermore, the universal 
Turing machine is t'-time nondecreasing for some function t' polynomial in t. 

Theorem 5. Let t, t{n) > n, he a nondecreasing time-constructihle function. 
There exists a ■ t^{n)logt{n)-time nondecreasing Turing machine which is 
t-time nondecreasing universal. 

Proof. For all x and for all e let qe{x) = m.aXp^\p\=t(\x\){N {e, p) < x}, i.e., qe{x) = 
Ml™{x), where N{e,p) and M*™ are defined in Lemma[D Recall that we identify 
a string x with a (dyadic) rational number denoted by O.a;. For all x, |a;| = n, 
let q{x) be a string that satisfies the following equation: 

0.g(a:) = 1/2 • O.gfya:) -b 1/4 • 0.q2{x) -b • • • + 1/2" • 0.qn{x). 

Since 0.qi{x) < 1 for all i and 1/2* < 1, q{x) is well defined. Let U be 
defined as follows. 

input V 

m = 0 

while g(l'") G Left(z)) do m = m -b 1 

if g(0’") ^ Left(u) then y = 

else 

y = X 

for ^ = 1 to m — 1 

if g(ylO™“^) ^ Left(u) then y = yO else y = yl 
let X be the successor of y 

if u € Left(( 7 (a;)) or q{x) is a prefix of v then output x 

First we observe that {7 is a nondecreasing t'-time bounded Turing machine. 
If U stops on input v and outputs a string x then either q{x~) G Left(u) and 
V G Left(q(x)) or q(x) is a prefix of v. Hence U stops on input vz for any z 
and in this case U(v) = U{yz). Now assume that v G Left(u) and U stops on 
input V and u. Let and denote the strings computed on input u and v, 
respectively. Since any string in Left(ti) is in Left(u) it follows that ?/„ < y^ and 
hence U{v) < U{u). 
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Secondly we consider the time complexity of U . Since by LemmaQfor any e, 
Qe is computable in time q is computable in time 0{n^t{n)'^) log n. 

Since q is evaluated 0{n) times (where n is the length of the output) on strings 
of size at most n, the computation of U is 0(n^t(n)^ logt(n)) time bounded. 

Now assume that is t-time nondecreasing. We show that Knd[/(a:) < 
KndMe(a^) + e + 1. Assume that for some x, \x\ > e, Me{u) = x. Since Me is 
nondecreasing and prefix free O.u < O.u — for all v such that Me{v) < x. 
(O.u < O.u since Me is t-time nondecreasing, and |0.u — 0.u| > since Me 
is a prefix machine, i.e. if Me stops on inputs u and v, then v is not a prefix 
of u and u is not a prefix of v). Since qi is nondecreasing for all i, 0.q{x) > 

0.g(a;-) + 2-('=+l“l). 

Let q{x~) = wi . . . Wm, where Wi G {0.1} for all i < m. Let p be a string of 
length e+ |u| + l such that O.p = O.wi • • • We_|_|„|_|_i +2“(®+l“l+^\ where Wi = 0 for 
all TO < i < e + |m| + 1. Then either q{x~) G Left(p) and p G Left(g(a;)) or q{x) is 
aprefixofp. Since \p\ = |M| + e+l it follows that is Kndj/(a;) < KndMe(a:) + e+l. 

In the following we fix for every t such a f-time nondecreasing universal 
Turing machine Ut and define Kndt(a;) = Knd(7j(a;) for all x. 

3.1 Time Bounded Universal Distributions 

Let t be a time-constructible nondecreasing function and Ut a t-time nondecreas- 
ing universal Turing machine. Then the t-time bounded algorithmic probability 
Rt is defined as follows. For all x, Rt{x) = Then a f-time bounded 

priori probability Qut is defined as follows. For all x, QuA^) = Sp where 

p ranges over all strings such that Ut{p) = x but Ut{p') does not stop for any 
prefix p' of p. 

Let p be a polynomial and /ii,/i2,’ • • be a enumeration of p(t)-computable 
semimeasures containing all t-computable semimeasures m- Then mt{x) = 
Sra<|a;| ot{'n) ■ Pn{x) is Universal for the class of t-computable semimeasure. Fur- 
thermore, rrit is computable in time polynomial in t. 

Theorem 6. Let t be a nondeereasing time-eonstruetihle funetion. There is a 
eonstant c and polynomials pi, p 2 , and ps sueh that for every x, —log to* ( a;) < 
— log (a;) < — log i?p2(t) (a:) < ~ logTOp3p)(a;) up to an additive constant 

less than c. 

Proof. Recall from Section |^| the definition of a Shannon-Fano code E induced 
by mt, and the inverse D of E. Then for all x, — logTO((x) < |A(x)| 4-2. Since D 
is a nondecreasing prefix code and computable in time p(f) for some polynomial 
p, it follows that for all x there exists a string v of length |A(x)| -I- d such that 
Upi (p) = X where d is some constant independent of x. Hence 2^^+^ • 
mt{x). 

Since U is t-time nondecreasing, the probability of a string p such that 
Ut{p) = X is at most 4-2 “ where g is a shortest string such that Ut{q) = x. That 
is 4 • Knd(7t(a;) > where p ranges over all strings such that Ut{p) = x 

but U{p') does not stop for any prefix p' of p. That is 4 • (a^) < Rp^ix). 
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Finally note that fj,(x) = O.p — O.q, where p = Kndt(x) and q = Kndt(a;“) 
defines a semimeasure computable in time p{t) for some polynomial p. Hence, 
for some constant d' and all x, d! • Rp^(ff{x) < — logmp 3 p)(a;). 

Since by definition Rp^(j.'^{x) = Kndp 2 (t)(a;), theoremElcan been seen as a justifi- 
cation of the definition of time-bounded nondecreasing Kolmogorov complexity. 
Finally we note that the concept classes shown to simple PAC-learnable in the 
t-time bounded setting [Zj are simple PAC-learnable w.r.t. the universal distri- 
bution mt- 

A concept class is simple PAC-learnable (in the t-time bounded setting) if 
it is PAC-learnable [m w.r.t. any simple (t-time bounded) distribution. In j0| 
Li and Vitanyi show that any concept class that is PAC learnable under a t- 
universal distribution, is PAC learnable under all f-computable distributions, 
provided that (in the learning phase) the examples are drawn according to the 
universal distribution. In 0 Li and Vitanyi (see also |2j) give several examples 
of concept classes that are PAC learnable w.r.t. the distribution e.g. the 
class of DNF of logarithmic Kolmogorov complexity. 

We observe that the proofs of the learnability results depend on the fact 
that examples of low t-time bounded prefix Kolmogorov complexity have high 
probability w.r.t. mpy By Proposition |2| and Theorem^this holds for any p{t)~ 
universal distribution where p is some fixed polynomial. 
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Abstract. Building upon the known generalized-quantifier-based first- 
order characterization of LOGCFL, we lay the groundwork for a deeper 
investigation. Specifically, we examine subclasses of LOGCFL arising 
from varying the arity and nesting of groupoidal quantihers. Our work 
extends the elaborate theory relating monoidal quantihers to NC^ and 
its subclasses. In the absence of the BIT predicate, we resolve the main 
issues: we show in particular that no single outermost unary groupoidal 
quantiher with FO can capture all the context-free languages, and we ob- 
tain the surprising result that a variant of Greibach’s “hardest context- 
free language” is LOGCFL-complete under quantiher-free BIT-free inter- 
pretations. We then prove that FO with unary groupoidal quantihers is 
strictly more expressive with the BIT predicate than without. Consider- 
ing a particular groupoidal quantiher, we prove that hrst-order logic with 
majority of pairs is strictly more expressive than hrst-order with majority 
of individuals. As a technical tool of independent interest, we dehne the 
notion of an aperiodic nondeterministic hnite automaton and prove that 
FO translations are precisely the mappings computed by single-valued 
aperiodic nondeterministic hnite transducers. 



1 Introduction 

In Finite Automata, Formal Logic, and Circuit Complexity [HI, Howard Straub- 
ing surveys an elegant theory relating finite semigroup theory, first-order logic, 
and computational complexity. The gist of this theory is that questions about the 
structure of the complexity class NC^, defined from logarithmic depth bounded 
fan-in Boolean circuits, can be translated back and forth into questions about the 
expressibility of first-order logic augmented with new predicates and quantifiers. 
Such a translation provides new insights, makes tools from one field available 

* Research performed while on leave at the Universitat Tubingen. Supported by the 
(German) DFG, the (Ganadian) NSFRG and the (Quebec) FGAR. 
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in the other, suggests tractable refinements to the hard open questions in the 
separate fields, and puts the obstacles to further progress in a clear perspective. 

In this way, although, for example, the unresolved strict containment in NC^ 
of the class ACC°, defined from bounded-depth polynomial-size unbounded fan- 
in circuits over {AND, OR, MOD}, remains a barrier since the work of Smolensky 
m, significant progress was made in (1) understanding the power of the BIT 
predicate and the related circuit uniformity issues 0,(2) describing the regular 
languages within subclasses of NO 1 0IE1, and (3) identifying the all-important 
role of the interplay between arbitrary and regular numerical predicates in the 
status of the ACC° versus NC^ question [d P- 169, Conjecture IX. 3. 4]. 

Barrington, Immerman and Straubing P| introduced the notion of a monoidal 
quantifier and noted that, for any non-solvable group G, the class NC^ can be 
described using first-order logic augmented with a monoidal quantifier for G. 
Loosely speaking, such a quantifier provides a constrained “oracle call” to the 
word problem for G (defined essentially as the problem of computing the product 
of a sequence of elements of G) . 

Bedard, Lemieux and McKenzie @] later noted that there is a fixed finite 
groupoid whose word problem is complete for the class LOGCFL of languages 
reducible in logarithmic space to a context-free language peg. A groupoid G 
is a set with a binary operation on which no constraint — such as associativity 
or commutativity — is placed. The word problem for G is the set of all those 
sequences of elements of G that can be bracketed to evaluate to a fixed element 
of G. It is not hard to see that any context-free language is the word problem 
of some groupoid, and that any groupoid word problem is context-free (see pfl 
Lemma 3.1]). 

It followed that LOGCFL, a well-studied class which contains nondetermin- 
istic logarithmic space and is presumably much larger than NC^, can be 
described by first-order logic augmented with groupoidal quantifiers. These quan- 
tifiers can be defined formally as Lindstrom quantifiers nm for context-free lan- 
guages. 

In this paper, we take up this first-order characterization of LOGCFL, and 
initiate an investigation of LOGCFL from the viewpoint of descriptive com- 
plexity. The rationale for this study, which encompasses the study of NC^, is 
that tools from logic might be of use in ultimately elucidating the structure of 
LOGCFL. We do not claim new separations of the major subclasses of LOGCFL 
here. But we make a first step, in effect settling necessary preliminary questions 
afforded by the first-order framework. 

Our precise results concern the relative expressiveness of first-order formulas 
with ordering (written FO), interpreted over finite strings, and with: (1) nested 
versus unnested groupoidal quantifiers, (2) unary versus non-unary groupoidal 
quantifiers, (3) the presence versus the absence of the BIT predicate. Feature 
(3) was the focus of an important part of the work by Barrington, Immerman 
and Straubing P| on uniformity within NC^. Feature (2) was also considered, to 
a lesser extent, by the same authors, who left open the question of whether the 
“majority-of-pairs” quantifier could be simulated by a unary majority quantifier 
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in the absence of the BIT predicate Pl P- 297]. Feature (1) is akin to comparing 
many-one reducibility with Turing reducibility in traditional complexity theory. 

Here we examine all combinations of features (1), (2) and (3). Our separation 
results are summarized on Fig. ^in Sect. 5. In the absence of the BIT predicate, 
we are able to determine the following relationships: 

— FO to which a single unary groupoidal quantifier is applied, written QoJpFO, 
captures the CFLs, and is strictly less expressive than FO with nested unary 
quantifiers, written FO(Qq"p), which in its turn is strictly weaker than 
LOGCFL. A consequence of this result, as we will see, is an answer to the 
above mentioned open question from 0 : We show that first-order logic with 
the majority-of-pairs quantifier is strictly more expressive than first-order 
logic with majority of individuals, see Corollary ITM 

— No single groupoid G captures all the CFLs as Q™FO, i. e. as FO to which 
the single unary groupoidal quantifier Q™ is applied, 

— FO to which a single non-unary groupoidal quantifier is applied, written 
QcrpFO, captures LOGCFL; our proof implies, remarkably, that adding a 
padding symbol to Greibach’s hardest context-free language 0, see also 0, 
yields a language which is LOGCFL-complete under BIT-free quantifier-free 
interpretations . 

When the BIT predicate is present, first-order with non-unary groupoidal quan- 
tifiers of course still describes LOGCFL. In the setting of monoidal quantifiers 
0, FO with BIT is known to capture uniform circuit classes, notably uniform 
ACC°, which have not yet been separated from NC^. We face a similar situa- 
tion here: the BIT predicate allows capturing classes (for example FObit(Q(>p), 
verifying TC° C FObit(Q(>p) G LOGCFL), which only a major breakthrough 
would seem to allow separating from each other. We are able to attest to the 
strength of the BIT predicate in the setting of unary quantifiers, proving that: 

— Q(>pFO C QcJpFObit, i- e. (trivially) some non-context-free languages are 
expressible using BIT and a single unary groupoidal quantifier, 

— FO(Qq"p) C FObit(Q(>p), be. (more interestingly) BIT adds expressivity 
even when unary groupoidal quantifiers can be nested. 

We also develop a technical tool of independent interest, in the form of an ape- 
riodic (a. k. a. group- free, a. k. a. counter-free) nondeterministic finite automaton. 
Aperiodicity has been studied intensively, most notably in connection with the 
star-free regular languages but, to the best of our knowledge, always in a 
deterministic context. Here we define a NFA A to be aperiodic if the DFA result- 
ing from applying the subset construction to A is aperiodic. The usefulness of 
this notion lies in the fact, proved here, that first-order translations are precisely 
those mappings which are computable by single-valued aperiodic nondetermin- 
istic finite transducers. 

Due to lack of space most proofs are omitted in this abstract. A full paper 
including complete proofs of all our claims can be obtained as EGGG Report 
98-59 from http://www.eccc.uni-trier.de. 
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2 Preliminaries 

2.1 Complexity Theory 

REG and CFL refer to the regular and to the e-free context-free languages 
respectively. The CFL results in this paper could be adapted to treat the empty 
string e in standard ways. We will make scant reference to the inclusion chain 
AC° C ACC° C TC° C NC^ C NL C LOGCFL = SAC^ C P. We refer the 
reader to H2| for definitions of these classes. 

2.2 The First-Order Framework 

We consider first-order logic with linear order. We restrict our attention to string 
signatures^ i. e. signatures of the form , . . . , Pa*), where all the predicates Pai 
are unary, and in every structure A, A \= PatU) iff the jth symbol in the input 
is the letter o^. Such structures are thus words over the alphabet (oi,...,as), 
and first-order variables range over positions within such a word, i. e. from 1 to 
the word length n. For technical reasons that will become apparent shortly, we 
assume here, as in the rest of the paper, a linear order on each alphabet and we 
write alphabets as sequences of symbols to indicate that order. 

Our basic formulas are built from variables in the usual way, using the 
Boolean connectives {A, V, ->}, the relevant predicates Pa^ together with {=,<}, 
the constants min and max, the quantifiers {3,V}, and parentheses. We will oc- 
casionally use the binary predicate BIT(a;,?/), defined to be true iff the xth bit 
in the binary representation of y is 1. We write BC(£) to denote the Boolean 
closure of the set C of languages (i. e. closure under intersection, union, and 
complement) and BG~'’(£) to denote the closure under union and intersection 
only. 

Definition 1. Lindstrom quantifier. Gonsider a language L over an alphabet 
S = (oi, 02 , . . . , Os), s > 1. Let a: be a /c-tuple of variables (each of which ranges 
from 1 to the “input length” n, as we have seen). In the following, we assume the 
lexical ordering on {1, 2, . . . , n}^, and we write Xi, X 2 , ■ ■ ■ , X.^k for the sequence 
of potential values taken on by x. The k-ary groupoidal quantifier Q l binding x 
takes a meaning if s — 1 formulas, each having as free variables the variables in x 
(and possibly others), are available. Let (j)ifx), 4>2{x), ■ • ■ , 4>s-i{x) be these s — 1 
formulas. Then Qi,x[(j)i{x), (p 2 {x ), . . . , 4>s-i{x)] holds on a string w = w\ ■ ■ ■ Wn 
(where Wi £ X for all i), iff the word of length vA whose ith letter, 1 < i < 
is 

r oi if w ^ 4>i(Xi), 

I 02 if w h -^fiiX^) A 4>2{X^), 

[ Os if w ^ ^(j)i{Xi) A ^(j)2{Xi) A ... A 

belongs to L. Thus the formulas [^i(a;), 4>2{x ), . . . , (()s-i(a:)] fix a function map- 
ping an input word/structure w of length n to a word of length . This function 
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is called the reduction or transformation defined by [4>i{x), 4 > 2 {x ), . . . , ^s-i(5l)]- 
In case we deal with the binary alphabet (s = 2) we omit the braces and write 
Qlx4>{x) for short. 



Definition 2. A groupoidal quantifier is a Lindstrom quantifier Ql where L is 
a context-free language. 

The Lindstrom quantifiers of Definition E are more precisely what has been 
refered to as “Lindstrom quantifiers on strings” |S|. The original more general 
definition m uses transformations to arbitrary structures, not necessarily of 
string signature. However, in the context of this paper reductions to CFLs play 
a role of utmost importance, and hence the above definition seems to be the 
most natural. The terminology groupoidal quantifier stems from the fact that 
any context-free language is a word problem over some groupoid ^ Lemma 3.1], 
and vice-versa every word problem of a finite groupoid is context-free. Thus a 
Lindstrom quantifier on strings defined by a context-free language is nothing 
else than a Lindstrom quantifier (in the classical sense) defined by a structure 
that is a finite groupoid multiplication table. 

Barrington, Immerman, and Straubing, defining monoidal quantifiers in Pj, 
in fact proceed along the same avenue: they first show how monoid word problems 
can be seen as languages, and then define generalized quantifiers given by such 
languages (see |3| pp. 284f.j). 



2.3 Unary Quantifiers and Homomorphisms 

We will encounter unary groupoidal quantifiers repeatedly. Here we show how 
these relate to standard formal language operations. In a different context, a re- 
sult very similar to the next theorem is known as Nivat ’s Theorem m Theorem 
3.8, p. 207]. 

Theorem 3. Let B be an arbitrary language, and let A be describable in QffFO, 
that is, by a first order formula preceded by one unary Lindstrom quantifier 
(i. e. binding exactly one variable). Then there are length-preserving homomor- 
phisms g, h and a regular language D such that A = h{D n g~^{B)). 



2.4 Groupoid-Based Language Classes 

Fix a finite groupoid G. Each S C G defines a language W’(S', G) composed of 
all words w, over the alphabet G, which “multiply out” to an element of S when 
an appropriate legal bracketing of w is chosen. 

Definition 4. QqFO is the set of languages describable by applying a sin- 
gle groupoidal quantifier Qi fo appropriate tuple of FO formulas, where 
L = W{S, G) for some S C G. 

QcrpFO is the union, over all finite groupoids G, of QgFO. 
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FO(Qg) and FO(QGrp) are defined analogously, but allowing groupoidal quan- 
tifiers to be used as any other quantifier would (i. e. allowing arbitrary nesting). 
Q™FO and FObit(QGrp )5 are defined analogously, but possibly allowing the 
BIT predicate (signaled by subscripting FO with bit) and/or restricting to unary 
groupoidal quantifiers (signaled by the superscript “un” ) . 



3 An Automaton Characterization of FO-Translations 

As a technical tool, it will be convenient to have an automata-theoretic character- 
ization of first-order translations, i. e. of reductions defined by FO-formulas with 
one free variable. Since FO precisely describes the (regular) languages accepted 
by aperiodic deterministic finite automata one might expect aperiodic de- 
terministic finite transducers to capture FO-translations. This is not the case 
however because, e.g. the FO-translation which maps every string • • • Wn to 
w" cannot be computed by such a device. 

We show in this section that the appropriate automaton model to use is 
that of a single- valued aperiodic nondeterministic finite transducer, which we 
define and associate with FO-translations in this section. But first, we discuss 
the notion of an aperiodic NFA. 

Definition 5. A deterministic or nondeterministic FA M is aperiodic (or group- 
free) iff there is an n G N such that for all states s and all words w, 

5(s,w") = (5(s,w"+i). 

Here 6 is the extension of M’s transition function from symbols to words. Observe 
that if M is nondeterministic then S{t,v) is a set of states, i.e. locally here we 
abuse notation by not distinguishing between M’s extended transition function 
S and the function S* as defined in the context of a nondeterministic transducer 
below. 



Remark 6. This definition of aperiodicity for a DFA is the usual one (see j I tij 1 . 
For a NFA, a statement obviously equivalent to Definition 0 would be that 
A is aperiodic iff applying the subset construction to A yields an aperiodic 
DFA. Hence jTJ a language L is star-free iff some aperiodic (deterministic or 
nondeterministic) finite automaton accepts L. 



Definition 7. A finite transducer M is given by a set Q of states, an input 
alphabet S, an output alphabet F, an initial state qo, a transition relation 5 C 
Q X S X r X Q and a set F C Q of final states. For a string w = wi ■ ■ ■ Wn & S* 
we define the set Om{w) of outputs of M on input w as follows. A string v G F* 
of length n is in Om{w), if there is a sequence sq = (?o, si, . . . ,Sn of states, such 
that Sn G F and, for every z, 1 < i < n, we have {si-i,Wi, Vi, Si) G S. 

We say that M is single-valued if, for every w G S* , \Om{w)\ = 1. If M is 
single- valued it naturally defines a function fM : F* — > F*. 
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For every string u G E* and every state s G Q we write S*(s,u) for the set 
of states s' that are reachable from s on input u (i. e., there are si, . . . , S|„| = s' 
and vi • ■ ■ r!|„| such that, for every i, I < i < |u|, we have {si-i,Ui, Vi, Si) G S). 

As per Definition^, M is aperiodic if there is an n C N such that for all states 
q and all strings w, 5*{q,w'") = 5*{q,w'"^^). 

Theorem 8. A function f : E* F* is defined by an FO translation if and 
only if it is defined by a single-valued aperiodic finite transducer. 

4 First-Order with Groupoidal Quantifiers 

4.1 Non-unary Groupoidal Quantifiers 

Clearly all classes from our framework fall within LOGCFL. But even one 
groupoidal quantifier to the left is sufficient: 

Theorem 9. There is a fixed groupoid G such that QcFObit = FObit(QGrp) = 
LOGCFL. 

As we show next, the largest attainable class LOGCFL is even characterizable 
without bit, as long as we allow arbitrary arities of the quantifiers: 

Theorem 10. There is a fixed groupoid G such that LOGCFL C Q^FO, hence 
QcrpFO = FO(QGrp) = LOGCFL. 

Corollary 11. Greibach’s hardest context-free language with a neutral symbol 
is complete for LOGCFL under quantifier-free interpretations without BIT. 

4.2 Unary Groupoidal Quantifiers without BIT 

In the previous subsection, we have shown that the situation with non-unary 
groupoidal quantifiers is clearcut, since a single such quantifier, even without 
the BIT predicate, captures all of LOGCFL. Here we examine the case of unary 
quantifiers. Let us first turn to the case of unary groupoidal quantifiers without 
BIT. 

Theorem 12. Qg^pFO = CFL. 

Proof. The direction from right to left is obvious. The converse direction is 
proved by appealing to Theorem |3 and observing that the context-free languages 
have the required closure properties. 

It follows immediately that nesting unary groupoidal quantifiers (in fact, 
merely taking the Boolean closure of Qg^pFO) adds expressiveness: 

Corollary 13. Q™rpFO = CFL C BC+(QgJpFO) = BC+(CFL) 

C BC(QgJpFO) = BC(CFL) 

C FO(QgJp). 
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Can we refine Theorem and find a universal finite groupoid G which 
captures all the context-free languages as Intuition from the world of 

monoids 0 p. 303] suggests that the answer is no. Proving that this is indeed 
the case is the content of Theorem IT^helow. We first make a definition and state 
a lemma. 

Let Dt be the context-free one-sided Dyck language over 2t symbols, i. e. Dt 
consists of the well-bracketed words over an alphabet of t distinct types of paren- 
theses. Recall that a PDA is a nondeterministic automaton which reads its input 
from left to right and has access to a pushdown store with a fixed pushdown al- 
phabet. We say that a PDA A is k-pushdown-limited, for k a positive integer, iff 
the pushdown alphabet of A has size k, and A pushes no more than k symbols 
on its stack between any two successive input head motions. 

Lemma 14. No k-pushdown-limited PDA accepts Dt when t > {k -\- 1)^ -|- 1. 



Theorem 15. Any finite groupoid G verifies Q™FO C CFL. 

Proof. Suppose to the contrary that G is a finite groupoid such that Q™FO = 
CFL. Then there is a FO-translation from each context-free language to a word 
problem for G. This means that a finite set of PDAs (one for each word problem 
VV’(-,G)) can take care of answering each “oracle question” resulting from such 
a FO-translation. By Theorem 0 each FO-translation is computed by a single- 
valued NFA. Although the NFAs differ for different context-free languages (and 
this holds in particular when language alphabets differ), the NFAs do not bolster 
the “pushdown-limits” of the PDAs which answer all oracle questions. Hence if 
k is a, fixed integer such that all word problems W(-,G) for G are accepted by 
a fc-pushdown-limited PDA, then for any positive integer t, Dt is accepted by a 
fc-limited-pushdown PDA. This contradicts Lemma when t = (fc-l-l)^-l-l. 

In the next subsection we will see that the BIT-predicate provably adds ex- 
pressive power to the logic Qc>pFC)- Since it is known that BIT can be expressed 
either by plus and times 0 (cf-. 0 ) or by the majority of pairs quantifier the 
following two simple observations about the power of Qo^pFO are of particular 
interest. 

Theorem 16. 1. The majority quantifier is definable in Q(>pFC)- 

2. Addition is definable in Qg^pFO. 

4.3 Unary Groupoidal Quantifiers with BIT 

What are QcSpFObit and FObit(Q(>p)? It seems plausible that QoJpFObit C 
FObit(Q(>p) C LOGCFL, but we are unable to prove these separations. Since 
TC° is captured by first-order logic with bit and majority quantifiers (definable 
by a context-free language), we conclude TC° C FObit(QGrp)i hence proving the 
latter separation would prove TC° yf LOGCFL, settling a major open question 
in complexity theory. 
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Easily CFL = Qg^pFO C QoJpFObit, since { 0^" | n G N } is in the difference 
of the two classes. The remainder of this subsection is devoted to documenting a 
more complicated setting in which the BIT predicate provably adds expressive- 
ness. 

Theorem 17. FObit(QGrp) contained in FO(Qq"p). 

As a particular case we can now solve an open question of |2j, addressing the 
power of different arity for majority quantifiers. 

Corollary 18. Majority of pairs can not be expressed in first-order logic with 
unary majority quantifiers. 

Proof. In Theorem I I til it was observed that the unary majority quantifier can 
be simulated in FO(Qq"p). On the other hand in PI it is shown that majority 
of pairs is sufficient to simulate the BIT predicate. But as FObit(Ql>p) is not 
contained in FO(Qq"p) the BIT predicate and hence the majority of pairs is 
not definable in FO(Qq"p), hence it can not be simulated by unary majority 
quantifiers. 



Corollary 19. Multiplication is not definable in FO(Qq"p). 



5 Conclusion 

Figurenidepicts the first-order groupoidal-quantifier-based classes studied in this 
paper. Together with the new characterization of FO-translations by means of 
aperiodic finite transducers, the relationships shown on Fig. Q summarize our 
contribution. 

A number of open questions are apparent from Figure Q Clearly, it would 
be nice to separate the FObit-based classes, in particular FObit(QGrp) from 
FObit(QGrp), but this is a daunting task. A sensible approach then is to begin 
with QoJpFObit- How does this compare with TC° for example? Can we at least 
separate QoJpFObit from LOGCFL? We know that QoJpFObit 2 FO(Qq"p); a 
witness for this is the set { 0"^ | n G N } , cf. the proof of Theorem ini 

An important fundamental question of course is if we can hope for an al- 
gebraic theory of groupoids to explain the detailed structure of CFL, much in 
the way that an elaborate theory of monoids is used in the extensive first-order 
parameterization of REG. 
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Fig. 1. The new landscape. Here G stands for any fixed groupoid, and a thick 
line indicates strict inclusion. 
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Abstract. Model checking is a method for the verification of systems 
with respect to their specihcations. Symbolic model-checking, which en- 
ables the verihcation of large systems, proceeds by evaluating fixed-point 
expressions over the system’s set of states. Such evaluation is particularly 
simple and efficient when the expressions do not contain alternation be- 
tween least and greatest hxed-point operators; namely, when they belong 
to the alternation-free ^-calculus (AFMC). Not all specihcations, how- 
ever, can be translated to AFMC, which is exactly as expressive as weak 
monadic second-order logic (WS2S). Rabin showed that a set T of trees 
can be expressed in WS2S if and only if both T and its complement can 
be recognized by nondeterministic Biichi tree automata. For the “only 
if” direction, Rabin constructed, given two nondeterministic Biichi tree 
automata U and U' that recognize T and its complement, a WS2S for- 
mula that is satished by exactly all trees in T. Since the translation 
of WS2S to AFMC is nonelementary, this construction is not practical. 
Arnold and Niwihski improved Rabin’s construction by a direct transla- 
tion of U and U' to AFMC, which involves a doubly-exponential blow-up 
and is therefore still impractical. In this paper we describe an alterna- 
tive and quadratic translation of Id and U' to AFMC. Our translation 
goes through weak alternating tree automata, and constitutes a step to- 
wards efficient symbolic model checking of highly expressive specihcation 
formalisms. 



1 Introduction 

In model checking, we verify the correctness of a system with respect to a desired 
behavior by checking whether a structure that models the system satisfies a 
formula that specifies this behavior. Commercial model-checking tools need to 
cope with the exceedingly large state-spaces that are present in real-life designs, 
making the so-called state-explosion problem one of the most challenging areas 
in computer-aided verification. One of the most important developments in this 

* Part of this work was done when this author was visiting Cadence Berkeley Labora- 
tories. 
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C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 455-^^^ 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



456 Orna Kupferman and Moshe Y. Vardi 



area is the discovery of symbolic model-checking methods |TdCM+9^ IMcMhSj . In 
particular, the use of BDDs for model representation has yielded model- 

checking tools that can handle systems with 10^^° states and beyond |( 

Typically, symbolic model-checking tools proceed by computing fixed-point 
expressions over the model’s set of states. For example, to find the set of states 
from which a state satisfying some predicate p is reachable, the model checker 
starts with the set S of states in which p holds, and repeatedly add to S the set 
30 S of states that have a successor in S. Formally, the model checker calculates 
the least fixed-point of the expression S = pV 30 S. The evaluation of such 
expressions is particularly simple when they contain no alternation between least 
and greatest fixed-point operators. Formally, the evaluation of expressions in 
alternation-free p-calculus (AFMC) IIKozHdl IFIL 86 I can be solved in time that is 
linear in both the size of the model and the length of the formula 
contrast, the evaluation of expressions in which there is a single alternation takes 
time that is quadratic in the size of the model. Since the models are very large, 
the difference with the linear complexity of AFMC is very significant !HKSVD7j . 
Hence, it is desired to translate specification to AFMC. Not all specifications, 
however, can be translated to AFMC |KV98aj , and known translations to AFMC 
involve a blow-up that makes them impractical. In this paper we describe an 
alternative translation of specifications to AFMC. 

Second-order logic is a powerful formalism for expressing properties of se- 
quences and trees. We can view all common program logics as fragments of 
second-order logic. Second-order logic also serves as the specification language 
in the model-checking tool MONA jEKM98L IKla,98j . While in first-order logic 
one can only quantify individual variables, second-order logic enables also the 
quantification of setqj. For example, the formula 

3X.e e A A Vz(z S A ^ -^{succ{z) € A)) A Vz(z € A ^ P{z)) 

specifies sequences in which P holds at all even positions. We distinguish between 
two types of logic, linear and branching. In second-order logic with one successor 
(SIS), the formulas describe sequences and contain, as the example above, the 
successor operator succ. In second-order logic with two successors (S2S), formu- 
las describe trees and contain both left-successor and right-successor operators. 
For example, the S2S formula 

3A.e G A A Vz(z G A ^ {P{z) A (succi{z) G A V succr{z) G A))) 

specifies trees in which P holds along at least one path. 

Second-order logic motivated the introduction and study of finite automata 
on infinite objects. Like automata on finite objects, automata on infinite objects 
either accept or reject their input. Since a run on an infinite object does not 
have a final state, acceptance is determined with respect to the set of states 
visited infinitely often during the run. For example, in Biichi automata, some of 

^ Thus, we consider monadic second-order logic, where quantification is over unary 
relations 
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the states are designated as accepting states, and a run is accepting iff it visits 
states from the accepting set infinitely often |Hiic62| (when the run is a tree, it is 
required to visit infinitely many accepting states along each path) . More general 
are Rabin automata, whose acceptance conditions involve a set of pairs of sets of 
states. The tight relation between automata on infinite objects and second-order 
logic was first established for the linear paradigm. In Hu^, Biichi translated 
SIS formulas to nondeterministic Biichi word automata. Then, in IBabBDI . Ra- 
bin translated S2S formulas to nondeterministic Rabin tree automata. These 
fundamental works led to the solution of the decision problem for SIS and S2S, 
and were the key to the solution of many more problems in mathematical logic 

EM- 

Recall that we are looking for a fragment of S2S that can be translated to 
AFMC. Known results about the expressive power of different types of automata 
enabled the study of definability of properties within fragments of second-order 
logic. In jRabTOj . Rabin showed that nondeterministic Biichi tree automata are 
strictly less expressive than nondeterministic Rabin tree automata, and that they 
are not closed under complementation. Rabin also showed that for every set T of 
trees, both T and its complement can be recognized by nondeterministic Biichi 
tree automata iff T can be specified in a fragment of S2S, called weak second- 
order logic (WS2S), in which set quantification is restricted to finite sets. For 
the “only if” direction, Rabin constructed, given two nondeterministic Biichi 
tree automata U and W that recognize T and its complement, a WS2S formula 
that is satisfied by exactly all trees in T. 

It turned out that WS2S is exactly the fragment of S2S we are looking for, 
thus WS2S=AFMC. In jlN92] . Arnold and Niwihski showed that every AFMC 
formula can be translated to an equivalent WS2S formula. For the other direc- 
tion, they constructed, given U and W as above, an AFMC formula that is 
satisfied by exactly all trees accepted by U. The translation in f.A.N'rJ is doubly 
exponential. Thus, if U and U' has n and m states, respectively, the AFMC 
formula is of length 2^ . While this improves the nonelementary translation of 

Rabin’s WS2S formula to AFMC, it is still not useful in practice. In this paper 
we present a quadratic translation of U and U' to an AFMC formula that is 
satisfied by exactly all trees accepted by U. Our translation goes through weak 
alternating automata | |MSS 8 fi| . Thus, while the characterizations of WS2S in 
EiEZEIMnS go from logic to automata and then back to logic, our construc- 
tion provides a clean, purely automata-theoretic, characterization of WS2S. 

In an alternating automaton inisni EZESsii, the transition function can in- 
duce both existential and universal requirements on the automaton. For example, 
a transition 5{q, a) — q\\J {q 2 A ( 73 ) of an alternating word automaton means that 
the automaton in state q accepts a word cr • r iff it accepts the suffix r either from 
state qi or from both states 92 and ( 73 . In a weak automaton, the automaton’s 
set of states is partitioned into partially ordered sets. Each set is classified as 
accepting or rejecting. The transition function is restricted so that in each tran- 
sition, the automaton either stays at the same set or moves to a set smaller in 
the partial order. Thus, each run of a weak automaton eventually gets trapped 
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in some set in the partition. Acceptance is then determined according to the 
classification of this set. It is shown in IIMMMsfil that formulas of WS 2 S can be 
translated to weak alternating tree automata. Moreover, it is shown in mm 
that weak alternating automata can be linearly translated to AFMC. 

Given two nondeterministic Biichi tree automata U and U' that recognize a 
language and its complement, we construct a weak alternating tree automaton 
A equivalent to U. The number of states in A is quadratic in the number of 
states of U and W . Precisely, if U and W has n and m states, respectively, the 
automaton A has (nm)'^ states. The linear translation of weak alternating tree 
automata to AFMC then completes a translation to AFMC of the same com- 
plexity. Our translation can be viewed as a step towards efficient symbolic model 
checking of highly expressive specification formalisms such as the fragment of 
the branching temporal logic CTL* that can be translated to WS 2 S. A step that 
is still missing in order to complete this goal is a translation of CTL* formulas 
to nondeterministic Biichi tree automata, when such a translation exists. From 
a theoretical point of view, our translation completes the picture of “quadratic 
weakening” in both the linear and the branching paradigm. The equivalence in 
expressive power of nondeterministic Biichi and Rabin word automata jIVIcINItiti) 
implied that WSIS is as expressive as SIS frh^ . The latter equivalence is 
supported by an automaton construction: given a nondeterministic Biichi word 
automaton, one can construct an equivalent weak alternating word automaton of 
quadratic size fra 7 | . In the branching paradigm, WS 2 S is strictly less expres- 
sive than S 2 S, and a nondeterministic Biichi tree automaton can be translated to 
a weak alternating tree automaton only if its complement can also be recognized 
by a nondeterministic Biichi tree automaton. It follows from our construction 
that the size of the equivalent weak alternating automaton is then quadratic in 
the sizes of the two automata. 

2 Tree Automata 

A full infinite binary tree (tree) is the set T = {l,r}*. The elements of T are 
called nodes, and the empty word e is the root of T. For every x € T, the nodes 
X ■ 1 and X ■ r are the successors of x. A path tt of a tree T is a set tt C T such 
that e S 7T and for every x G tt, exactly one successor of x is in tt. For two nodes 
xi and X2 of T, we say that xi < X2 iff xi is a prefix of X2; i.e., there exists 
z £ {l,r}* such that X2 = xi ■ z. We say that xi < X2 iff xi < X2 and xi X2- 
A frontier of an infinite tree is a set if C T of nodes such that for every path 
7 T C T, we have | 7 rnif| = 1 . For example, the set E = { 1 , rll, rlr, rr} is a frontier. 
For two frontiers Ei and E2, we say that Ei < E2 iff for every node X2 £ E2, 
there exists a node X\ £ E\ such that X\ < X2- We say that E\ < E2 iff for 
every node X2 £ E2, there exists a node X\ £ E\ such that x\ < X2- Note that 
while El < E2 implies that Ei < E2 and Ei ^ E2, the other direction does not 
necessarily hold. Given an alphabet E, a E -labeled tree is a pair (T, V) where T 
is a tree and V : T ^ E maps each node of T to a letter in E. We denote by 
Vs the set of all if-labeled trees. For a A-labeled tree {T, V) and a set A V E, 
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we say that E is an A-frontier iff i? is a frontier and for every node x G E, we 
have V (x) G A. 

Automata on infinite trees (tree automata) run on infinite 17-labeled trees. 
We first define nondeterministic Biichi tree automata (NBT). An NBT is U = 
{E,Q,S,qo,E) where E is the input alphabet, Q is a finite set of states, 5 : 
QxE^ 2'3x' 3 is a transition function, qo G Q is an initial state, and F Q is 
a Biichi acceptance condition. Intuitively, each pair in S{q, a) suggests a nonde- 
terministic choice for the automaton’s next configuration. When the automaton 
is in a state q as it reads a node x labeled by a letter a, it proceeds by first 
choosing a pair (q\,qr) G S(q,(7), and then splitting into two copies. One copy 
enters the state qi and proceeds to the node x ■ 1 (the left successor of x), and the 
other copy enters the state qr and proceeds to the node x ■ r (the right successor 
of x). Formally, a run of U on an input tree {T,V) is a Q-labeled tree {T,r), 
such that the following hold: 

- r(e) = qo- 

— Let X GT with r(x) = q. There exists {q\, qr) G S{q,V {x)) such that r{x ■ 1) = 

qi and r{x • r) = qr- 

Note that each node of the input tree corresponds to exactly one node in the 
run tree. 

Given a run (T, r) and a path tt C T, let m/(7r) C Q be such that q G zn/(7r) 
if and only if there are infinitely many x G tt for which r{x) = q. That is, m/(7r) 
contains exactly all the states that are visited infinitely often in tt. A path tt 
satisfies a Biichi acceptance condition F C Q if and only if inf{TT)C]F 0. A run 
(T, r) is accepting iff all its paths satisfy the acceptance condition. Equivalently, 
(T, r) is accepting iff (T, r) contains infinitely many E-frontiers Gq < Gi < . . .. 
A tree (T, V) is accepted by U iff there exists an accepting run of U on (T, V), 
in which case (T, V) belongs to the language, C{U), of U. We say that a set T 
of trees is in NBT iff there exists an NBT U such that C{U) = T. We say that 
T is in co-NBT iff the complement of T is in NBT; i.e., there exists an NBT U 
such that C{U) = Vs\T. 

Alternating tree automata generalize nondeterministic tree automata and 
were first introduced in USHZI. In order to define alternating tree automata, 
we first need some notations. For a given set X, let B^{X) be the set of positive 
Boolean formulas over X (i.e.. Boolean formulas built from elements in X using 
A and V), where we also allow the formulas true and false and, as usual, A has 
precedence over V. For a set Y Gl X and a formula 9 G B^{X), we say that Y 
satisfies 9 iff assigning true to elements in Y and assigning false to elements in 
X\Y satisfies 9. 

A finite alternating automaton over infinite binary trees is A = (A, Q, S, qo, F) 
where E, Q, qo, and F are as in NBT, and S : Q x E ^ S'*"({l,r} x Q) is a 
transition function. A run of an alternating automaton A over a tree {T, V) is 
a (T X Q)“labeled tree {Tr,r). The tree is not necessarily binary and it may 
have states with no successors. Thus, C IN* is such that if x ■ c G Tr where 
X G IN* and c G IN, then also x G Tr- For every x G Tr, the nodes x ■ c, with 
c G IN, are the successors of x. Each node of corresponds to a node of T . A 



460 Orna Kupferman and Moshe Y. Vardi 



node in Tr, labeled by {x,q), describes a copy of the automaton that reads the 
node X of T and visits the state q. Note that many nodes of Tr can correspond 
to the same node of T; in contrast, in a run of a nondeterministic automaton 
over {T, V) there is a one-to-one correspondence between the nodes of the run 
and the nodes of the tree. The labels of a node and its successors have to satisfy 
the transition function. Formally, (fy,r) satisfies the following: 

1. e &Tr and r{e) = (e,qo). 

2. Let y G Tr with r{y) = (x,q) and S{q,V{x)) = 9. Then there is a possibly 
empty set S = {(cq, go), (ci, gi), . . . , (c„, g„)} C {l,r} x Q, such that the 
following hold: 

— S satisfies 6, and 

— for all 0 < z < n, we have y ■ i GTr and r{y ■ i) = {x ■ Ci, qi). 

For a run (fy, r) and an infinite path tt C fy, we define inf^n) to be the set 
of states that are visited infinitely often in tt, thus g G zn/(7r) if and only if there 
are infinitely many y G tt for which r(y) G T x {g}. A run {Tr,r) is accepting 
if all its infinite paths satisfy the Biichi acceptance condition. As with NBT, a 
tree (T, V) is accepted by A iff there exists an accepting run of A on (T, V ) , in 
which case {T,V) belongs to C{A). 

Example 1. We define an alternating Biichi tree automaton A that accepts ex- 
actly all {a, b, c}-labeled binary trees in which all paths have a node labeled a and 
there exists a path with two successive b labels. Let A = ({a, 6, c}, {go, gi, g 2 , gs}, 
S, go, 0), where S is defined in the table below. 



g 


S{q,a) 


d{q,b) 


6{q,c) 


go 


(0,gi) V (I,gi) 


(O.ga) A (l,gs)A 
((0,g2) V (l,g2)) 


(0,g3) A (l,ga)A 
((0,gi) V (l,gi)) 


gi 


(0,gi) V (I,gi) 


(0,g2) V (I,g2) 


(0,gi) V (I,gi) 


g2 


(0,gi) V (I,gi) 


true 


(0,gi) V (I,gi) 


gs 


true 


(0,gs) A (l,ga) 


(0,gs) A (I,g3) 



In the state go, the automaton checks both requirements. If a is true, only 
the second requirement is left to be checked. This is done by sending a copy in 
state gi, which searches for two successive b’s in some branch, to either the left 
or the right child. If b is true, A needs to send more copies. First, it needs to 
check that all paths in the left and right subtrees have a node labeled a. This is 
done by sending copies in state go to both the left and the right children. Second, 
it needs to check that one of these subtrees contains two successive b’s. This is 
done (keeping in mind that the just read b may be the first 6 in a sequence of two 
b’s) by sending a copy in state g 2 to one of the children. Similarly, if c is true, A 
sends copies that check both requirements. As before, a requirement about a is 
sent universally and a requirement about the b’s is sent existentially. □ 

In |MSSR6IJ . Muller et al. introduce weak alternating tree automata (AWT). 
In an AWT, we have a Biichi acceptance condition F C Q and there exists a 
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partition of Q into disjoint sets, Qi, such that for each set Qi, either Qi C F, in 
which case Qi is an accepting set, or Qif] F — ttt, va. which case Qi is a rejecting 
set. In addition, there exists a partial order < on the collection of the Qi’s such 
that for every q G Qi and q' G Qj for which q' occurs in 6{q, a), for some a G E, 
we have Qj < Qi. Thus, transitions from a state in Qi lead to states in either 
the same Qi or a lower one. It follows that every infinite path of a run of an 
AWT ultimately gets “trapped” within some Qi. The path then satisfies the 
acceptance condition if and only if Qi is an accepting set. Indeed, a run visits 
infinitely many states in F if and only if it gets trapped in an accepting set. 



2.1 Traps 

Let U = {E, S, So, M, F) and U' = {E,S' ,s'q,M' ,F') be two NET, and let 
I S' I • I S' I = m. In [IH.a hTflj . Rabin studies the composition of a run of U with 
a run of lA' . Recall that an accepting run of U contains infinitely many F- 
frontiers Go < G\ < . . ., and an accepting run of W contains infinitely many 
i^'-frontiers G'o < G'l <.... It follows that for every tree {T,V) G C{U) D C{U') 
and accepting runs {T,r) and {T,r') of U and W on {T,V), the composition of 
(T, r) and (T, r') contains infinitely many frontiers Ei C T, with Ei < ifi+i, such 
that (r, r) reaches an F-frontier and (T, r') reaches an F'-frontier between Ei 
and Fi_|_i. Rabin shows that the existence of m such frontiers, in the composition 
of some runs of U and U', is sufficient to imply that the intersection C{U)r)C{W) 
is not empty. Below we repeat Rabin’s result, with some different notations. 

Let U and W be as above. We say that a sequence Fq, . . . , Em of frontiers 
of T is a trap for U and W iff Fq = e and there exists a tree (F, V) and runs 
(T, r) and (F, r') of U and W on (F, V), such that for every 0 < f < m — 1, the 
following hold. 

— (F, r) contains an F-frontier Gi such that Ei < Gi < F^+i, and 

— {T,r') contains an F'-frontier G' such that Ei < G[ < F^+i. 

We say that (F, r) and {T,r') witness the trap for U and W . 

Theorem 1. lEabTOI Consider two nondeterministic Biichi tree automata U 
and lA' . If there exists a trap for lA and lA' , then L{IA) n E(IA') is not empty. 

3 Prom NBT and Co-NBT to AWT 

Theorem 2. Let lA and lA' be two NBT with C{IA') = Ve \C{IA). There exists 
an AWT A such that C{A) — C{IA) and the size of A is quadratic in the sizes 
of lA and lA' . 

Proof. Let lA = {E, S, sq, M, F) and lA' = {E, S', Sg, M', F'), and let [S’! • |S"| = 
TO. We define the AWT A = {E, Q, qo, S, a) as follows. 
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~ Q = {{S X {_L, T}) X S" X { 0 , . . . , m}) \{S x {T }) x S' x {to}. Intuitively, a 
copy of A that visits the state ((s, 7), s', i) as it reads the node x of the input 
tree corresponds to runs r and r' of lA and W that visit the states s and s' , 
respectively, as they read the node x of the input tree. Let p = yo,yi, . . . , y\x\ 
be the path from e to a;. Consider the joint behavior of r and r' on p. We 
can represent this behavior by a sequence Tp = {to, I'q), (ti, t{), . . . , 
of pairs in S' x S" where tj = r{yj) and = r'(y'). We say that a pair 
{t,t') e S X S' is an F -pair iff t S S' and is an F' -pair iff t' S F' . We can 
partition the sequence Tp to blocks /3 q, /?i, • • ■ , A such that we close block ( 3 k 
and open block ( 3 k+i whenever we reach the first S"-pair that is preceded by 
an S'-pair in ( 3 k- In other words, whenever we open a block, we first look for an 
S'-pair, ignoring S''-pairness. Once an S'-pair is detected, we look for an F'- 
pair, ignoring S'-pairness. Once an S"-pair is detected, we close the current 
block and we open a new block. Note that a block may contain a single pair 
that is both an S'-pair and an S"-pair. The number i in {{s , j) , s' , i) is the 
index of the last block in Tp. The status 7 G |T,T} indicates whether the 
block Pi already contains an S'-pair, in which case 7 = T, or Pi does not 
contain an S'-pair, in which case 7 = T. 

For a status 7 G |T, T } and an index i G {0 , . . . , to}, let i = {S x {7}) x 
S' X {i}. 

- qo = ((so,-L),s'o,0). 

— In order to define the transition function S, we first define two functions, 

new^ : Q \ Q±^m |-L, T } and newi : Q \ Q±^m (0 , . . . , to}, as follows. 



new7(((s,7),s',f)) 



T If s' ^ F' and (7 = T or s G S'). 
T Otherwise. 



nem(((s,7),s',z)) 



z + I If s' G F' and (7 = T or s G F). 
z Otherwise. 



Intuitively, new^ and newi are responsible for the recording and tracking of 
blocks. Recall that the status 7 indicates whether an F-pair in the current 
block has already been detected. As such, the new status is T whenever s is 
in F or 7 is T, unless s' is in F', in which case (s, s') is the last pair in the 
current block, and the new status is T. Similarly, the index z is increased 
to z + 1 whenever we detect an F'-pair that is either also an F-pair or, as 
indicated by 7, preceded by an F-pair in the same block. 

The automaton A proceeds as follows. Essentially, for every run {T,r') of 
W, the automaton A guesses a run (T, r) of U such that for every path p 
of T, the run (T, r) visits F along p at least as many times as {T, r') visits 
F' along p. Since C{U) n C{W) = 0, no run (F, r) can witness with {T,r') 
a trap for U and U' . Consequently, recording of visits to F and F' along p 
can be completed once A detects that Tp contains to blocks as above. 
Formally, let q = {{s, 7), s', i) be such that M{s, a) = {(ui,vi ), . . . , (zz„, u„)} 
and M'{s', a) = |(zz}, u}), . . . , {u'^i , v'^,)} . We distinguish between two cases. 
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• If g ^ then S{q, a) is 

A V (1, {{up, new'y{q)),u'f,, newi{q)))A{r, {{vp, new'y{q)), v'^.,newi{q))). 

l<k<n' 



• If g € then S{q, a) is 

* Al<fe<n' Vl<p<n (l,((up,_L),<,m)) A (r,((wp,_L),z;;,m)), if s ^ 

* true, otherwise. 

— a = (S' X {T}) X S" X {0 . . . , m — 1}. Thus, a makes sure that infinite paths 
of the run visits infinitely many states in which the status is T. 

The automaton A is indeed an AWT. Clearly, each set Qj^i is either contained 
in a or is disjoint from a. The partial order on the sets is defined by 
iff either i < i', or i = i' and 7 ' = T. Note that, by the definition of a, a run is 
accepting iff no path of it gets trapped in a set of the form Q_Ly, namely a set 
in which A is waiting for a visit of ZY in a state in F . The size of A is 0{rn?). 

We prove that C{A) = L(U). We first prove that C(U) C C{A). Consider 
a tree {T,V). With every run {T,r) of U on {T,V) we can associate a single 
run (T}i,R) of A on {T,V). Intuitively, the run (T, r) directs {Tr,R) in the 
only nondeterminism in S. Formally, recall that a run of A on a tree (T, V) 
is a (T X Q)-l£^beled tree {Tn,R), where a node y & Tr with R{y) = {x,q) 
corresponds to a copy of A that reads the node x GT and visits the state q. We 
define {Tr, R) as follows. 

— s GTr and R{s) = (e, ((sq, T), Sg, 0)). 

— Consider a node y € Tr with R{y) = (a;, ((s, 7 ), s', f)). By the definition of 

{Tr, R) so far, we have r{x) = s. Let r{x • 1 ) = u and r{x ■ r) = v. Also, 
let M'{s',a) = 7 ' = newj{{{s,j), s' ,i)), and i' = 

newi{{{s, 7 ), s', i)). We define S' = {( 1 , {{u, 7 '), uj, f')), (r, {{v, 7 '), ui, z')), . . . , 
( 1 , ((zz, 7 '), z')), (r, ((z;, 7 '),z;(j, ,z'))}. By the definition of S, the set S sat- 

isfies ( 5 (((s, 7 ),s',z),C(a;)). For all 0 < j < n' — 1, we have y ■ 2j G Tr with 
R{y ■ 2j) = (a: • 1 , ((zz, 7 '), zz' , z')), and y ■ 2j + 1 £ Tr with R{y ■ 2j -k 1) = 
(a;-r,((z;,7'),z;',z'A 



Consider a tree {T,V) £ C{U). Let (T, r) be an accepting run of U on {T,V), 
and let {Tr,R) be the run of A on {T,V) induced by {T,r). It is easy to see 
that {Tr,R) is accepting. Indeed, as (T,r) contains infinitely many F'-frontiers, 
no infinite paths of {Tr, R) can get trapped in a set Q_Ly. 

It is left to prove that C{A) C C{U). For that, we prove that C{A)(^C{W) = 
0. Since C{U) = Vs \ C,{W), it follows that every tree that is accepted by A 
is also accepted by U. Consider a tree {T,V). With each run (Tr,R) of A on 
{T,V) and run {T,r') of W on {T,V), we can associate a run (T, r) of U on 
{T,V). Intuitively, {T,r) makes the choices that {Tr,R) has made in its copies 
that correspond to the run {T,r'). Formally, we define (T, r) as follows. 
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- r{e) = So- 

— Consider a node x G T with r(x) = s. Let r'{x) = s'. The run {T,r') fixes 
a pair {u',v') G M'{s' ,V{x)) that W proceeds with when it reads the node 
X. Formally, let {u',v') be such that r'{x -1) = u' and r'(x ■ r) = v' . By 
the definition of r(x) so far, the run (Tr, R) contains a node y G Tr with 
R{y) = (a;, ((5,7), s', i)) for some 7 and i. If i5(((s, 7), s', i), C(a;)) = true, 
we define the reminder of (T, r) arbitrarily. Otherwise, by the definition of 
6 , the successors of y in Tr fix the pair in M{s,V{x)) that A proceeds with 
per each pair in M'{s',V{x)). In particular, Tr contains at least two nodes 
y ■ Cl and y ■ C2 such that R{y ■ c\) = (a; • 1, ((m, 7'), m', t')) and R{y ■ C2) = 
(a; • r, ((c,7'),c',z')), for some 7' and i' . We then define r{x ■ 1 ) = u and 
r(x ■ r) = V. 

We can now prove that C{A) n C{W) = 0. Assume, by way of contradiction, 
that there exists a tree (T, V) such that (T, V) is accepted by both A and W . Let 
{Tn,R) and {T,r') be the accepting runs of A and W on (T, C), respectively, 
and let (T, r) be the run of U on {T,V) induced by (rR,i?) and {T,r'). We 
claim that then, (T, r) and {T,r') witness a trap for U and U'. Since, however, 
C{U) n C{U') = 0, it follows from Theorem n that no such trap exists, and 
we reach a contradiction. To see that {T, r) and (T, r') indeed witness a trap, 
define Eq = e, and define, for 0 < z < m — 1, the set Ei+i to contain exactly all 
nodes x for which there exists y G Tr with R{y) = {x, {{r{x),j),r'{x),i)) and 
newi{{{r{x),'^),r' {x),i}) = z + 1. That is, for every path p of T, the set Ei+i 
consists of the nodes in which the z’th block is closed in Tp. By the definition 
of 6 , for all 0 < z < TO — 1, the run (T, r) contains an A-frontier Gi such 
that Ei < Gi < Ei+i and the run (T, r') contains an F'-frontier G' such that 
Ei < G'i < Fj+i. Hence, Fq, . . . , F^ is a trap for U and U' . □ 



4 Discussion 

Today, automata on infinite objects are used for specification and verification of 
nonterminating programs. By translating specifications to automata, we reduce 
questions about programs and their specifications to questions about automata. 
More specifically, questions such as satisfiability of specifications and correctness 
of programs with respect to their specifications are reduced to questions such 
as nonemptiness and language containment |vwH^iBvwpiimiivwM| . 
The automata-theoretic approach separates the logical and the combinatorial 
aspects of verification. The translation of specifications to automata handles the 
logic and shifts all the combinatorial difficulties to automata-theoretic problems. 
There are many types of automata, and choosing the appropriate type for the 
application is important. 

We believe that weak alternating automata are often a good choice. The spe- 
cial structure of weak alternating automata is reflected in their attractive com- 
putational properties. For example, while the best known complexity for solv- 
ing the 1-letter emptiness problem for Biichi alternating automata is quadratic 
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time, we know how to solve the problem for weak alternating automata in lin- 
ear time Ema. In addition, weak alternating automata can be very eas- 
ily complemented. In the linear paradigm, where WS1S=S1S, weak alternating 
word automata (AWW) can recognize all the w-regular languages. In particular, 
the translation of LTL formulas to AWW is linear, and follows the syntax of 
the formula IV^ . Moreover, it is known how to translate other types of au- 
tomata to AWW efficiently IK V 971 IKV98bl . In the branching paradigm, where 
WS2S<S2S, AWT can recognize exactly all specifications that can be efficiently 
checked symbolically. The translation of CTL and AFMC formulas to AWT is 
linear and simple IfiVWildl . As we have seen in this paper, the translation of 
two NET for a specification and its complementation to AWT involves only a 
quadratic blow up. In particular, we believe that model-checking tools like Mona 
IVI981 IKla,98| . which have WSIS and WS2S as their specification languages, 
may benefit from employing weak alternating automata. 
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Abstract. In this paper, we consider computing the difference between 
two Horn theories. This problem may arise, for example, if we take care 
of a theory change in a knowledge base. In general, the difference of Horn 
theories is not Horn. Therefore, we consider Horn approximations of the 
difference in terms of Horn cores (i.e., weakest Horn theories included in 
the difference). We study the problem under the familiar representation 
of Horn theories by Horn GNFs, as well as under the recently proposed 
model-based representation in terms of the characteristic models. For all 
problems and representations, polynomial time algorithms or proofs of 
intractability for the propositional case are provided. 



Keywords: computational issues in AI, knowledge compilation, difference of Horn 
theories, Horn approximation, Horn core 



1 Introduction 

Among the basic operations for combining logical theories are the Boolean op- 
erations, i.e., conjunction A, disjunction V, and complement In this paper, 
we consider another basic operation, the difference \, which can represent the 
complement of a theory S C {0, 1}" (i.e., a set of models) by {0, 1}"" \ E. 
In principle, we are interested in a particular fragment of theories (such as Horn 
theories), and would like to know whether the result of such operations also 
belong to this fragment. 

Along this line, we study the problem of computing the Boolean difference 
between two Horn theories and E 2 , i.e., if = \ ^12. In general, the resulting 

theory E is not Horn. Therefore, we consider approximating if by a Horn theory, 
in order to maintain the desired closure property. Different such approximations 
are possible, among which Horn cores imni are quite natural. 

A Horn theory U C {0,1}" is a Horn core (or a Horn greatest lower bound) 
of a theory E C {0, 1}", if 7T C if, i.e., H logically implies if, and there is no 
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weaker 77 of this property, i.e., no Horn 77' exists such that 77 C 77' C 17. Ob- 
serve that, in general, a theory S has more than one Horn core; e.g., S = {(110), 
(101)} has two Horn cores Si = {(110)} and S 2 = {(101)}, respectively. Ap- 
proximating a propositional logic theory by Horn theories (or Horn cores) is used 
for knowledge compilation in El . Because of its theoretical and practical impor- 
tance, semantical and computational issues on Horn cores have been studied 
extensively, and a number of results have been obtained, cf. I11I2I12I7I1I . 

The main contributions of the present paper can be summarized as follows. 



• We present characterizations of the Horn cores of a Horn difference 77i \ 772, 
which will form a basis of the algorithms discussed in this paper. 

• We either present a polynomial time algorithm or prove intractability (unless 
P=NP) for each of the problems mentioned above. 

• Besides the familiar representation in terms of Horn CNFs, we also consider the 
model-based representation of Horn theories through their sets of characteristic 
models m- This alternative has also been studied repeatedly, since it offers 
advantages to formula-based representation in certain cases; see Mil (14j for 
more details. 



Our results on the complexity of these issues are summarized in Table Q 
which gives a complete picture of the tractability/intractability frontier of these 
problems. The table also shows results on the Horn envelope [I I I I 2| of the dif- 
ference, i.e., the (unique) least Horn theory 77 such that 77i \ 772 C E, which we 
do not discuss here. The interested reader is referred to p]. 

The results on formula-based representation will complement those results 
known from mm, which focus on theories represented by CNFs or restricted 
classes of CNFs, i.e., the problem input is a (possibly restricted) CNF. In con- 
trast, we also deal with non-CNF formulas. The tractability of computing one 
Horn core of a Horn difference is a positive result, since this problem is intractable 
for a general theory. 

For the case in which a theory is represented by a CNF formula ip, 0 contains 
a polynomial time algorithm for computing a Horn core of tp by consulting an NP 
oracle, and shows that the problem is P^^[0(logn)]-hard. Most recently, in P, 
another algorithm for computing a Horn core from a given CNF is represented, 
which is based on the classical Davis-Putnam procedure for the satisfiability 
problem. Observe, however, that these algorithms are not efficiently applicable 
to our problem, since they require a CNF formula for input; by rewriting the 
difference of Horn CNFs pi and p 2 to a CNF, the size of the formula represent- 
ing Pi \ p 2 might exponentially increase. Our result that the Horn cores of a 
Horn difference pi \ p 2 can be enumerated with polynomial delay parallels the 
analogous result if all models of a theory 77 are given as input m 

Our results find an application in the area of theory change, which has grown 
into an important research area within AI during the last decade. Taking the 
difference between Horn theories is, for instance, meaningful in the following 
scenario. Assume that we have a Horn theory 77, which is a description of all 
the possible worlds of a state of affairs; i.e., the real “world” amounts to one of 
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Table 1. Complexity of problems on the Horn difference \ S 2 (^i >^2 and 
n are Horn) 





representation 


Problem 


formula-based (Horn CNF) 


model-based (characteristic set) 


Is El \ E 2 Horn? 


P 


P 


Compute Ei\ E 2 , 
if it is Horn 


P 


P 


Is 77 a Horn core 
of Ti \ Ta? 


P 


P 


Compute one Horn core 
of El \ E 2 


P 


P 


Compute all Horn cores 
of El \ E 2 


polynomial 

delay 


not polynomial total 
time unless P = NP 


Is 77 the Horn 
envelope of Ei\E 2 l 


co-NP-complete 


P 


Compute the Horn 
envelope of Ei \ E 2 


not polynomial total 
time unless P = NP 


P 



the models in S, say w, but is unknown to us. Suppose that we now obtain the 
information that a formula ip is false in w. Since w is a categorical description of 
the world, either ip or —iip is true in it (but not both), for any formula ip. Hence, 
in this case, we conclude that is true in w, and thus all models v that satisfy 
if can be discarded from E. This amounts to updating E to Enew = E \ E', 
where E' is the set of models satisfying ip] if ip is Horn, this gives an instance of 
our problem. If Enew is not Horn, the Horn cores and the Horn envelope provide 
sound and complete approximations of the set Enew in terms of Horn theories. 
Our algorithms can be applied for computing Enew or its Horn approximations. 
Observe that computing a Horn core and the Horn envelope in the context of 
theory change was studied in 0, in which the theory ip o ip^ given by two Horn 
CNFs ip and ip, is studied for some revision operators “o” (see 0). However, 
their work is only weakly related to ours, since taking the difference of ip and 
Ip was not considered, and moreover, the formula ip is restricted to be a single 
Horn clause. 

For space reasons, most proofs are omitted. They are given in p]. 

2 Preliminaries 

We assume a supply of propositional variables xi, X 2 , ■ ■ ■ , Xn, where each Xi eval- 
uates to either 1 (true) or 0 (false). Negated variables are denoted by Xi. The Xi 
and Xi are called literals. A clause is a disjunction c = £ 1 V- • -Vik of literals, while 
a term is a conjunction t = iiA - ■ ■ Aik of literals. By P{c) and N{c) (resp., P{t) 



470 



Thomas Eiter, Toshihide Ibaraki, and Kazuhisa Makino 



and N(t)) we denote the sets of variables occurring positively and negatively in 
c (resp., t); _L (resp., T) denotes the empty clause (resp., empty term) represent- 
ing falsity (resp., truth). A conjunctive normal form (CNF) (resp., disjunctive 
normal form (DNF)) is a conjunction of clauses ip = f\^Ci (resp., a disjunction 
of terms p = \J iU). 

A model is a vector v G {0, 1}", whose Ath component is denoted by Vi. The 
models (0,0,..., 0) and (1,1,..., 1) are denoted by 0 and 1, respectively. We use 
V < w for the usual bitwise ordering of models, i.e., vi < Wi for alH = 1, . . . , n, 
where 0 < 1. A theory is any set S C {0, 1}" of models. A model t) G A is 
minimal in A, if no w G A exists such that w < v. 

For any formula p, let T{p) = {v G {0, 1}" | p{v) = 1} be the set of its 
models. A formula p represents a theory A if T(p) = A. If unambiguous, we 
do not distinguish a formula from the theory it represents. We write p < tp, if 
T{p) C T{tp) holds. A nontautological clause c (resp., noncontradictory term t) 
is an implicate (resp., implicant) of a theory A if c(v) = 1 for all v G E, i.e., 
{0, 1}" D T{c) T A (resp., t{v) = 0 for all v ^ S, i.e., 0 C T{f) C A); it is prime, 
if no proper subclause (resp., subterm) is an implicate (resp., implicant) of A. 

A theory A is Horn if A = (7 Za(A) holds, where CZa(S') is the closure 
of S' C {0, 1}" under bitwise AND (i.e., intersection) of models v and w, de- 
noted by V /\w. Observe that any Horn theory A has the least (unique small- 
est) model, which is given by v. For a Horn theory A, a model v G S 

is called characteristic ^3, if v ^ Cl/\{S \ {?;}) holds. The set of all charac- 
teristic models of A, the characteristic set of A, is denoted by C*(A). Note 
that every Horn theory A has the unique characteristic set C*(A). For exam- 
ple, the theory A = {(0101), (1001), (1000), (0001), (0000)} is Horn, and has 
C*(A) = 1(0101), (1001), (1000)}. 

A clause c is Horn (resp., negative, positive) if |A(c)| < 1 (resp., |A(c)| = 0, 
|A^(c)| = 0). A CNF is Horn (resp., negative, positive) if it contains only Horn 
(resp., negative, positive) clauses. It is well-known that a theory A is Horn if 
and only if it is represented by some Horn CNF, and that all prime implicates of 
a Horn theory are Horn. A theory is negative (resp., positive), if it is represented 
by some negative (resp., positive) CNF. If p represents a theory A, and ip is 
Horn CNF representing a Horn core of A, then ip is also called a Horn core of 
‘P- 

3 Horn Property of the Difference 

3.1 Formula-Based Representation 

Let Pi = ^i,j and p 2 = Ngh C 2 ,j be Horn CNFs. Then 

Pi\P2=\/"'Mpi/\^C2j) = At2j), (3.1) 

where t 2 j is the term equivalent to ~^C 2 j; e.g., if C 2 j = {x\ V T 2 V x^), then 
t 2 ,j = XiX 2 Xz- Note that each ipj = pi f\ t 2 j is a Horn CNF. Hence pi \ p 2 
can be represented by the disjunction of m 2 Horn theories ipj. Observe that in 
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general, deciding whether a disjunction of Horn CNFs is Horn is co-NP-complete 
However, we sketch a proof that checking the Horn property of the difference 
of Horn theories is polynomial. 

If (fi 2 = -L, then 



ipi\ip2=ipi (3.2) 

clearly holds, and hence the difference is Horn. Otherwise, we compute the least 
model V in (/? 2 , and separately consider the two cases: (1) = 0 and (2) 

ipi{v) = 1. 

Case (1). Since (pi is Horn, there is a Horn clause c* G such that c* > 
and c*{v) — 0. For a model u, let HC{u) denote the set of all maximal Horn 
clauses c such that c(u) = 0. E.g., if u = (111000), then HC{u) = {(Ti V T 2 V 
5:3 V X4), (xi VIc2 VIC3 V 0:5), (Ti Vx2 Vx3 V a:6)}- Now, some c in HC{v) satisfies 
c > c*(> ipi). 

It can be shown that if v ^ 1 , then 

(pi\(fi2 = v>i\ (<7>2 A x^) (3.3) 

holds for the Xi such that P{c) = {xi}. On the other hand, if u = 1, then 
T{<^ 2 ) = {1} and it follows 



\ <7’2 = <Pi- (3.4) 

Case (2). Clearly (t/Ji \ <^ 2 ){v) = 0. Thus, if \ ip 2 represents a Horn theory, 

there exists a Horn clause c* such that c* > \ ip 2 and c*{v) = 0. Some c in 

HC{v) then satisfies c > c*(> ipi \ (p 2 )- It can be shown that if u 1, then 

(pi\(fi 2 = (</5i A c) \ (<^2 A x^) (3.5) 

holds, where the Xi appears in c (i.e., P{c) = {a;;}). 

On the other hand, if u = 1, then in a manner similar to (EH) , we prove 

ipi\(f 2 = (fii A c. (3.6) 

Now we iterate as long as possible. As soon as f/32 = -L or u = 1 holds, we can 
conclude that ipi \ (p 2 is Horn by (14.2^ . (14.41) or (14. (il) . If there is no c G HC{v) 
such that c > ipi\ip 2 , then ipi \ if 2 is not Horn. In the remaining cases, we apply 
(t4.4l) or H.4.,41) (i.e., ipi is modified to ipi Ac in case of (14.41 . and ip 2 to if 2 A Xi in 
both cases). 

It can be shown that this procedure halts in finitely many steps. Formally, 
the algorithm can be written as follows. 

Algorithm CHECK-HORN 
Input: Horn CNFs if\ and if2- 

Output: If if I \ y>2 is a Horn theory S, then output a Horn CNF for otherwise, 
“No”. 
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Step 0. flii := ipr, (t >2 ■= 

Step 1. if (^2 = T then output (j}i and halt 

else begin compute the least model v of 02 
if 01 (u) = 0 then goto Step 2 
else(i.e., 4>i(v) = 1) goto Step 3 

end; 

Step 2. if t; = 1 then output 0i and halt 

else begin 

find a Horn clause c in HC{v) such that c > 0i; 

02 := 02 A Xi, where P{c) = {*i}; 
goto Step 1 
end; 

Step 3. Find a Horn clause c in HC{v) such that c > 0i \ 02; 
if no such c exists then output “No” and halt 
else begin 0i := 0i A c; 

if u = 1 then output 0i and halt 
else 02 := 02 A Xi, where P(c) = {®i}; 
goto Step 1 

end. □ 

Example 1. Let us apply CHECK-HORN to (pi = {x\ V X 2 ) A {x\ V x^) A {x 2 V 
x^) A {x 2 V 0:4) and (p 2 = X 4 A (xi V X 3 ) A {x 2 V X3). As we can check, the 
theory represented by tpi \ is A" = {(0111), (1011), (1010), (0011), (0001)}. Let 

01 := (fi and 02 := <P 2 - Since 02 ^ -L in Step 1, the least model v = (0000) of 

02 is computed. Since 0i(u) = 1, we branch to Step 3. Then HC{v) = {xi \ i = 
1, 2, . . . , 4}, and we can check that 0i \ 02 ^ Xi holds for alH = 1, 2, . . . , 4. Thus, 
the algorithm outputs “No” in Step 3. Observe that this is correct, since (1010), 
(0001) G S but (0000) = (1010) A (0001) ^ A, which means that A is not Horn. 

On the other hand, let cpi as above and ip'2 = X4 A {x2 V x^). Then, ipi \ 
represents A' = {(0111), (1011), (0011), (0001)}. Let 0i := and 02 := 
Then, in Step 1 the least model is v = (0000), and we again branch to Step 3, 
where HC{v) is as previous. Now, pi\p 2 < X 4 holds. Hence, we update 0i to 
01 = 01 A X4 and 02 to 4>2 AX4. Returning to Step 1, we find that 02 = A holds. 
Hence, 



01 = (a;i V X2) A (a;i V X3) A [x2 V 0:3) A [x2 V X4) A X4 

= {x\ V X2) A {x\ V X3) A (x2 V a;3) A X4 

is output. Observe that A' is represented by this 0i, and thus the output is 
correct. □ 

An analysis of the time complexity yields the first result. Note p = 1. for 

Horn p is decidable in linear time |3j, and that by a Horn difference p\ \ p 2 

is representable as Horn disjunction ipiV ■ ■ •V0m2- Thus, c > </3i\(/32 is equivalent 
to Horn -01 A = T for alH = 1, 2, . . . , m2- 

Theorem 1 . Let pi and p2 be Horn CNFs. Then, algorithm CHECK-HORN 
checks whether Pi \ p2 is Horn in 0 {n'^ + n^\p2\ + n^\pi\ + n\pi\\p2\) time. 



On the Difference of Horn Theories 



473 



3.2 Model-Based Representation 

Let and U2 be Horn theories. Let S be the set of models defined by 

S' = C'*(ri)U{t;/\u>|v,'u;eC*(i:i)}. (3.7) 

That is, S augments C'*(Z'i) by one step of the closure operator Cl/\{-). We split 
S into Si and S2 as follows: 

Si n S2 = ^ and S2 C ^2. (3.8) 

It is easy to see that C \ S2 and Cl/\{S2) C S2- 

Lemma 1. For a Horn theory E , let S as in n , and let Si and S2 he sets 
of models such that SiU S2 = S. Then Cl/^{Si) U Cl/^{S2) = E. 

By this lemma, it holds for the Si and S2 of 113. Sll that 

Cl;,{Si) U CU{S 2 ) = El. (3.9) 

The next lemma leads then to a polynomial time algorithm for our problem. 

Lemma 2. Let Ei and E2 be Horn theories, and let Si as in (13.811 . Then 
Cl^{Si) n L'2 = 0 holds if and only if Ei \E2 is a Horn theory. Furthermore, if 
El \ E2 is a Horn theory, then Cl^{Si) = Ei\E2 (he., C'*(S'i) = C*{Ei \ E2)) 
holds. 

It is known 0 that given Qi,Q2 ^ {0,1}”, deciding C/a(Qi) H CZa(Q2) = 0 
is possible in 0 {n{\Qi\ + |Qi|)) time. Since E2 = CZa(C*(^ 2)), we thus obtain 
from Lemma El turned into a straightforward algorithm, the following result. 

Theorem 2. Let Ei, E2 he Horn theories. Given C*{Ei) and C*{E2), we can 
check whether Ei \ E2 is Horn in 0 {n\C* {Ei)\^\C* (E2)\) time. Furthermore, if 
El \ E2 is Horn, C*{Ei \ E2) can be computed in 0 {n\C* {Ei)\^{\C* {Ei)\^ + 
|C'*(L'2)|)) time. 

4 Horn Cores 

4.1 Formula-Based Representation 

We first consider the problem of computing one Horn core of the difference 
(pi \ <p2, where <pi and <p2 are Horn CNFs. The algorithm is a modification of 
algorithm CHECK-HORN, which checks whether the difference (pi \ ip2 is Horn. 

In Step 3 of algorithm CHECK-HORN, if no Horn clause c is in HC{v) 
such that c > 4 >i \ (j)2, we conclude that the difference is not Horn and halt. 
To compute a Horn core, however, we update (j)i and 4>2 as 4 >i := fii A c and 
02 := 02 A Xi, respectively, for an appropriate c G HC{v). This is because no 
Horn core contains v. 
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Denote for a formula ip and a model w, by min>uj('0) the set of minimal 
models u such that tp{u) = 1 and u > w. Then, it holds that any clause c G 
HC{v) such that c{u) = 1 for some u G min>„((/)i \ ^ 2 ) is appropriate for our 
purpose. 

For models v, u with v < u, let HC{v, u) be the set of Horn clauses c G HC{v) 
whose positive literal Xj satisfies Uj = 1. E.g., if = (111000) and u = (111110), 
then HC{v\ u) = {(Ti Vx2 VT3 V X4), {x\ VIE2 VIC3 Vxs)}. To make the discussion 
clear and speed up the algorithm, based on a model u G min>„((()i \ (P2), we 
update (pi and (p2 respectively by 

(pi \= (pi /\ y/y c and (p 2 := ^2 A x^. (4-10) 

cGHC(v,u) Mj=lAi)j=0 



Thus, we have the following algorithm. 

Algorithm HORN-COREl 
Input: Horn CNFs ipi and v?2. 

Output: A Horn core of 

Steps 0.-2. as in CHECK-HORN 

Step 3. if there exists a Horn clause c in HC(v) such that c > (pi \ (p2 
then begin (pi := (pi A c; 

if u = 1 then output (pi and halt 
else (p2 := (p2 A Xi, where P(c) = {xi} 

end 

else begin find a model u G mm>„{(pi \(p2)', 

(pi := (pi A Aceirc(D;ii) 

<p2 ■=<p2A f\ Xj 
Uj = lAvj=0 

end 

goto Step 1 . □ 

Let (p^'^ and (p^^'^ respectively denote the (pi and (p 2 in Step 1 of the k-th 
iteration, and let be the least model in (p^K Then, it can be shown that 

for fc = 1,2,.... (4.11) 

This implies that the number of iterations is at most n -I- 1 . The only nonobvious 
polynomial operation in HORN-COREl is finding some u G min>„(^i \ (p 2 ) 
in Step 3. The models {w > u | (^i \ (p 2 ){w) = 1} can be described by a 
disjunction of m Horn CNFs ipi,. . . ,ipmj where m is the number of clauses in 
(p 2 - Consequently, the models in min>„(<()i \ (P 2 ) are the minimal models among 
the least models oiipi, , tpm- Each ipi results by fixing the value of some literals 
in (pi, and its least model can be computed in 0(|(^i| -|- n) time exploiting P|. 
Looking at the number of 1-bits in each model, we can thus find some u G 
min>„((^i \ (p 2 ) in 0{m{\(pi \ + n) + mn) = 0{m{\(pi \ + n)) time. An analysis of 
the time complexity gives us then the following result. 
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Theorem 3. Let ipi and (p 2 be Horn CNFs. Then algorithm HORN-COREl 
outputs a Horn core of ip I \ (f 2 in 0{n'^ + n^\ip 2 \ + + n\<pi\\p 2 \) time. 

As regards computing all Horn cores oi pi \ p 2 , we note that this problem 
is provably not solvable in polynomial time, since in general, an exponential 
number of Horn cores might exist However, we have a polynomial total time 
algorithm (called output-polynomial in j5|), i.e., it runs in polynomial time in the 
combined size of the input and the output. In fact, it enumerates all Horn cores 
with polynomial delay 0. 

Note that in HORN-COREl, the choice of u G min>„(^i \ (f> 2 ) in the else- 
statement of Step 3 results in a Horn core ip oi pi \ ip 2 such that ip(u) = 1 and 
ip{u') = 0 for all other models u' G min>„((()i \<p 2 )- We can also show that every 
Horn core ip satisfies ip{u) = 1 for some u. This means that every Horn core can 
be constructed by algorithm HORN-COREl, if it properly chooses a model u in 
the else-statement of Step 3. 

Thus, all Horn cores can be generated as follows. 

Algorithm ALL-HORN-CORES((^i, V? 2 ) 

Input: Horn CNFs pi and p 2 - 
Output: All Horn cores of pi \p 2 - 

Steps 0.-2. as in HORN-COREl and CHECK-HORN. 

Step 3. if there exists a Horn clause c in HC(v) such that c > pi \ 4>2 
then begin pi := pi A c; 

if u = 1 then output pi and exit; 

else begin p 2 := p 2 A Xi, where P(c) = {xi}\ goto Step 1 end 

end 

else for each a model u G min>„(0i \p 2 ) do 
begin Pi := Pi A AceffC(.;.) 
p2~p2/\ l\ Xj\ 

Uj = l/\Vj=0 

call ALL-HORN-CORES(</>i,02) 

endjfor}. □ 

An analysis of the time complexity leads us to the following result. 

Theorem 4. Algorithm ALL-HORN-CORES correctly generates all Horn cores 
of Pi \ p 2 with polynomial delay, where the delay is hounded by the time of 
computing one Horn core, i.e., -\- n^\p 2 \ + n’^\pi\+ n\pi\\p 2 \). 

4.2 Model-Based Representation 

In Subsection 13.21 we have seen that S\ of (13. SI characterizes the difference 
Si \ S 2 of two Horn theories Ei and S 2 . In this subsection, we show that any 
maximal set Q S\ such that Cl/\{Q) H S 2 = ^ gives a Horn core of Si \ S 2 , 
and conversely any Horn core of Si \ S 2 can be generated from such a maximal 
set Q Q Si. 
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Note that, by the definition of Si, any maximal set Q Q S with Cl/,,{Q)nS 2 = 
0 is contained in Si . Thus we prove the above statements by using a set S instead 
of S'!. 

Theorem 5 . Let Si, S2 be Horn theories and let S as in (E 3 . Then, H is 
a Horn eore of Si\ S2 iff H = Cl/^{Q) for some maximal Q C S sueh that 

cu(Q)r]S2 = 9. □ 

Based on Theorem El we obtain the following polynomial time algorithm. 
Algorithm HORN-CORE2 

Input: Characteristic sets C'*(X'i) and C*{S 2 ) of Horn theories Si and S 2 - 
Output: The characteristic set C*[H) of a Horn core H oi Ei\ S2- 

Step 0 . Q := 0; 

compute the set of models S given by (Id.YI : 

Step 1 . for each w £ S do begin 

if Cl/\(Q U {tc}) n S2 = 0 then Q := Q U {ui} 
end{for}; 

Step 2 . output C*{Q) and halt. □ 



Theorem 6. Let Si and S2 be Horn theories. Then, HORN-CORE2 outputs 
the characteristic set of a Horn core of Si \ S2 in 0 {n\C* {Si)\‘^ (\C* {Si)\‘^ + 
|T'2|)) time. 

Is also possible to decide whether a given theory 77 is a Horn core of the 
difference Si \ S 2 in polynomial time. A suitable algorithm is straightforward 
by proper modification of algorithm HORN-CORE2. We omit the details (see 

0 )- 

As for the efficient computation of all Horn cores, we have a negative result. 
There is no polynomial total time algorithm for generating all Horn cores, unless 
P = NP. 

Theorem 7 . There is no algorithm which, giuen C*{Si) and C*{S2) of Horn 
theories Si and S2, computes the characteristic sets C*{H) of all Horn cores 
n of Si \ S2 in polynomial total time (i.e., polynomial in the combined size of 
the input and the output), unless P = NP. 

5 Conclusion and Further Work 

The results of the present and companion papers m establish operations to- 
wards a Boolean “calculus” on Horn theories, in which Horn theories are com- 
bined using the operations of conjunction (A), disjunction (V), and difference 
(\). These operations are useful, for example, in the context of changing theories 
that are possible world representations of a state of affairs. 

Several issues remain for our ongoing work. One is giving a precise account to 
computing Horn cores and the Horn envelope of the complement i7 = {0,l}"\A' 
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of a Horn theory. Our results imply polynomial time algorithms in some cases, 
but a complete picture remains to be drawn. Another issue is a more accurate ac- 
count of the complexity of the polynomial cases in Table ^ Under formula-based 
representation, all these problems are complete for P under logspace reductions; 
this is an easy consequence of the fact that deciding the satisfiability of a Horn 
CNF is complete for P under logspace reductions. Therefore, parallelization of 
these problems is most likely not possible. Another interesting issue is a study 
of the effect of Horn renamings (cf. P ) . 

Acknowledgments. The authors appreciate the comments of the anonymous re- 
viewers. 
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Abstract. Quantum algorithms for factoring and finding discrete loga- 
rithms have previously been generalized to finding hidden subgroups of 
finite Abelian groups. This paper explores the possibility of extending 
this general viewpoint to finding hidden subgroups of noncommutative 
groups. We present a quantum algorithm for the special case of dihe- 
dral groups which determines the hidden subgroup in a linear number of 
calls to the input function. We also explore the difficulties of developing 
an algorithm to process the data to explicitly calculate a generating set 
for the subgroup. A general framework for the noncommutative hidden 
subgroup problem is discussed and we indicate future research directions. 



1 Introduction 

All known quantum algorithms which run super-polynomially faster than the 
most efficient probabilistic classical algorithm solve special cases of what is called 
the Abelian Hidden Subgroup Problem. This general formulation includes Shor’s 
celebrated algorithms for factoring and finding discrete logarithms HS|. A very 
natural question to ask is if quantum computers can efficiently solve the Hidden 
Subgroup Problem in noncommutative groups. This question has been raised 
regularly and seems important for at least three reasons. 

The first reason is that determining if two graphs are isomorphic reduces to 
finding hidden subgroups of symmetric groups. The second reason is that the 
noncommutative hidden subgroup problem arguably represents a most natural 
line of research in the area of quantum algorithmics. The third reason is that 
an efficient quantum algorithm for a hidden subgroup problem could potentially 
be used to show an exponential gap between quantum and classical two-party 
probabilistic communication complexity models m- 

The heart of the idea behind the quantum solution to the Abelian hidden sub- 
group problem is Fourier analysis on Abelian groups. The difficulties of Fourier 
analysis on noncommutative groups makes the noncommutative version of the 
problem very challenging. 

* Supported in part by BRIGS — Basic Research in Computer Science, Centre of the 
Danish National Research Foundation. 
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In this paper, we present the first known quantum algorithm for a noncom- 
mutative subgroup problem. We focus on dihedral groups because they are well- 
structured noncommutative groups, and because they contain an exponentially 
large number of different subgroups of small order, making classical guessing 
infeasible. Our main result is that there exists a quantum algorithm that solves 
the dihedral subgroup problem using only a linear number of evaluations of the 
function which is given as input. This is the first time such a result has been 
obtained for a noncommutative group. 

However, we hasten to add that our algorithm does not run in polynomial 
time, even though it only uses few evaluations of the given function. The rea- 
son for this is as follows: Our algorithm first applies a certain polynomial-time 
quantum subroutine a linear number of times, each time producing some out- 
put data, and each time using just one application of the given input function. 
The collection of all the output data determines the hidden subgroup with high 
probability. We know how to find the subgroup from those data in exponential 
time, but we do not know if this task can be done efficiently. 

Three important questions are left open. The first question is if there exists 
a polynomial-time algorithm (classical or quantum) to postprocess the output 
data from our quantum subroutine. The second is whether our algorithm can be 
used to show an exponential gap between quantum and classical probabilistic 
communication complexity models, as mentioned above. Currently, the state-of- 
the-art is an exponential separation between error-free models, and a quadratic 
separation between probabilistic models |S| • The third open question is for what 
other noncommutative groups similar results can be obtained. 



2 Algorithm for Dihedral Groups 

The Hidden Subgroup Problem is defined as follows: 

Given: A function j : G ^ R, where G is a finite group and R an arbitrary 
finite range. 

Promise: There exists a subgroup H ^ G such that 7 is constant and distinct 
on the left cosets of H . 

Problem: Find a generating set for H . 

We say of such a function 7 that it fulfills the subgroup promise with respect 
to H . We also say of 7 that it has hidden subgroup H. Note that we are not given 
the order of H. Without loss of generality we assume 7 is constant and distinct 
on left cosets because we may formally rename group elements and convert 
multiplication on the right to multiplication on the left. We assume throughout 
this paper that function 7 is given as a black box, so that it is not possible to 
obtain knowledge about it by any other means than evaluating it on points in 
its domain. 

If G is Abelian, then we refer to this problem as the Abelian Subgroup 
Problem. Similarly, if the given group is dihedral, then we refer to it as the 
Dihedral Subgroup Problem. Classically, if 7 is given as a black box, then the 
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Abelian subgroup problem is intractable: If G = Z 2 , then just to determine 
if H is non-trivial or not takes time exponential in n m- Here, Z 2 denotes the 
cyclic group of order 2. In contrast, the Abelian subgroup problem can be solved 
efficiently on a quantum computer 



Sf 9 e:i IK IH IK 



Theorem 1 (Abelian case). Let j : G ^ R be a function that fulfills the 
Abelian subgroup promise with respect to H . There exists a quantum algorithm 
that outputs a subset X C H such that X is a generating set for H with prob- 
ability at least 1 — 1/|G|, where |G| denotes the order of G. The algorithm uses 
0(log|G|) evaluations ofj, and runs in time polynomial in log|G| and in the 
time required to compute 7 . 



We review the quantum solution to the Abelian subgroup problem in terms 
of group representation theory in Section 0 below. For other reviews, see for 
example m- 

The dihedral group of order 2N is the symmetry group of an A-sided poly- 
gon. It is isomorphic to a semidirect product of the two cyclic groups Z^v and 
Z 2 of order N and 2, respectively. 



Dn = Zac Z2 



( 1 ) 



with multiplication defined by 

(ai,5i)(a2,62) = (oi -I- <^( 6 i)(a 2 ), 61 -\-b2). 

The homomorphism </> : Z 2 — > Aut(ZAr) is given by 1 1 -^ <))(l)(a) = —a. An ele- 
ment (a, b) G is a rotation if 6 = 0, and a reflection if 6 = 1. The group 
contains N rotations and N reflections, and the N rotations comprise the cyclic 
subgroup Zat X {0} ^ of index 2. 

Theorem 2 (Main theorem). Let 7 : R be a function that fulfills the 

dihedral subgroup promise with respect to H . There exists a quantum algorithm 
that given 7 , uses 6 >(log A) evaluations of ^ and outputs a subset X C H such 
that X is a generating set for H with probability at least 1 — 

Theorem constitutes our main result that the dihedral subgroup problem 
can be solved with few applications of the given function 7 . The essential part 
of the proof is that it is possible to find a hidden reflection. 

Theorem 3 (Finding a reflection). Let 7 : R be a function that fulfills 

the dihedral subgroup promise with respect to H . Suppose we are promised that 
H = {0} is either trivial, or H = {0,r} is generated by a reflection r G Aat. 
Then there exists a quantum algorithm that given 7 , outputs either “trivial” or 
the reflection r. Lf H is trivial then the output is always “trivial”, otherwise the 
algorithm outputs r with probability at least 1 — The algorithm uses at most 
891og(A) -k 7 evaluations ofj, but it runs in time 0{\fN). 
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In Theorem 0 suppose N is even and consider the decision problem of de- 
termining, for a hidden reflection r = (fco, 1), whether integer fco is even or odd. 
By that theorem, we can solve this decision problem with vanishing small er- 
ror probability using a number of evaluations of 7 linear in log N . In contrast, 
this decision problem is infeasible classically: just to obtain success probabil- 
ity 1/2-1- 2“"/^ requires more than 2"/^ evaluations of 7, where n = log(2A^)Q 
The reduction of the general problem given in Theorem |2| to the special case of 
order-2 subgroups in Theorem|3can be found in the appendix, so from now on, 
we consider only hidden subgroups of order 2 and of order 1. 

We assume that the reader is familiar with the basic notions of quantum 
computation ^ . The quantum algorithm we shall use to prove Theorem 0 is 

V-, = (Fat® W®I) o U.^ o ®W®I). (2) 

Here, I is the identity operator and is any unitary operator that satisfies that 

UJa)|5)|0) = |a)|6)|7(a,5)) (3) 

for all elements (a, 6) S Diq. The operator Fjv = l/)(*l 

quantum Fourier transform for the cyclic group where ujn = exp(27r-\/~l/.^) 
is the fVth principal root of unity. When N = 2, then the Fourier transform F2 

is equal to the Walsh-Hadamard transform W which maps a qubit in state \b) 

to the superposition ^(|0) -I- (— 1)^|1)). 

Suppose for a moment that we were not given a function defined on the di- 
hedral group Dn = Zat x |0 Z2, but instead a function defined on the Abelian 
group Z at X Z2. Or equivalently, suppose for the moment that (/ : Z2 — > Aut(Z at) 
is the trivial homomorphism. Then by Theorem Q we can And any hidden sub- 
group with probability exponentially close to 1 by applying the experiment 

(a,5) =Mi ,2 oV^|0)|0)|0) (4) 

a number of 0{logN) times. Here, A4 i ,2 denotes a measurement of the first two 
registers with outcome (a, b). A natural question to ask is, how much information, 
if any, would we gain by performing the experiment given in (0 when 7 is defined 
on Z?Af and not on Zat x Z2. Rewriting the state V7|0)|0)|0) as a superposition 
over the basis states shows that we indeed learn something, as quantified in the 
following lemma. 

Lemma 4. Let j : Dn R fulfill the subgroup promise with respeet to H = 
{0,r}, where r = (fco, 1) is a refieetion. Then, if we apply quantum algorithm V7 
on the initial state |0)|0)|0), the probability that a measurement of the first two 
registers yields (o, 0), is 

^(1 -kcos(27rfcoa/A^)) = ^ cos^(7rfcoa/A^). (5) 

Furthermore, the probability that the outcome is (a, 1), is -^sin^{Trkoa/N). 

^ The classical algorithm fails since there are N possible hidden reflections, and the 
algorithm can rule out at most of them by using T queries to 7. This argument 
is similar to the one used in for the Abelian group ZJ. 
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Let Z denote the discrete random variable defined by the probability mass 
function 

Prob[Z = z] = acos^^nkoz/N) {0 < z < N), (6) 

where a = l/N if fco = 0 or 2ko = N, and a = 2/N otherwise. Lemma El 
provides us with a quantum algorithm for sampling from Z. Intuitively, since 
the random variable Z depends on fco> the more samples we draw from Z, the 
more knowledge we gather about fco and the hidden reflection r = (feg, 1). The 
crucial question therefore becomes, how many samples from Z do we need, to 
be able to identify ko correctly with high probability. Theorem El below states 
that we only need a logarithmic number of samples. We postpone its proof till 
the next section. 

Theorem 5. Let m > [641niV], and let z\, Zm be m independent samples 
from Z. Let k G {1, . . . , [iV/2j } be such that the sum cos(27TKZi/A^) is 

maximal. Then k = min{ko, N — fco} with probability at least 1 — 

Proof (of Theorem The algorithm starts by disposing the possibility that 
r = (0, 1) by evaluating 7(0, 0) and 7(0, 1). If the two values are equal, then the 
algorithm outputs the reflection (0, 1) and stops. If N is even, then the algorithm 
proceeds by disposing the possibility that r = (fV/2, 1), too. 

Now, the algorithm applies the quantum experiment given in 01 a number 
of m' = 2[641n times. Let m denote the number of times it measures zero in 
the second register. Let {oi, . . . ,am} denote the outcomes in the first roister, 
conditioned to that the measurement of the second register yields a zerojj 
Suppose m > m'/2. The algorithm continues with classical post-processing: 
It finds 1 < At < [iV/2j such that the sum ^ cos(27rAtai/A^) is maximized. 
It then computes 7(/t, 1) and compares it with 7(0,0). If they are equal, it 
outputs the reflection (k, 1) and stops. Otherwise, it performs the same test 
for j{N — At, 1). If that one also fails, it outputs “trivial”. 

If m < m'/2, then the algorithm performs the same classical post-processing, 
except that it uses the m' — m measurements for which the output in the second 
register is 1, and except that it now seeks to maximize sin(27rAtOi/fV). 

If H is trivial, then the algorithm returns “trivial” with certainty. If H = 
{0,rj, then it outputs r = (fco, 1) with probability at least 1 — 1/27V by Theo- 
rem O The total number of evaluations of 7 is at most m' -|- 5 < 89 log -I- 7. 
Unfortunately, we do not know how to find k any faster than in time 0(\fN). □ 

3 Proof of Theorem ISl 

The proof of Theorem El requires two lemmas, the first of them being a result 
by Hoeffding El on the sum of bounded random variables. Hoeffding’s lemma 
says that the probability that the sum of m independent samples are off from 
its expected value by a constant fraction in m drops exponentially in m. 



^ Alternatively, we could apply amplitude amplification El to ensure that we will 
always measure 0 in the second register, instead of as here, only with probability 1/2. 
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Lemma 6 (Hoeffding) . Let Xi, . . . , be independent identically distributed 
random variables with i < Xi < u. Then, for all a > 0, 

Prob[S - E[S] > am] < , 

where S = Yh=i 

Let 0 < k < N , and suppose we want to test if k = or k = N — ko, where 
ko is given as in Lemma 01 Clearly, we can answer that question just by testing 
if 7(0, 0) = 7(fc, 1) or 7(0, 0) = 7(-/V — k,l). Lemmad provides us with another 
probabilistic method: First draw m samples {zi}^^ from Z, and then compute 
the sum cos{2nkzi/N). Conclude that k ^ ko and k ^ N — ko if and only 
if that sum is at most to/4. 

Lemma 7. Fix an integer k with 0 < k < N . Let z\, ... ,Zm be m independent 
samples from Z. Then with probability at most we have 

m 

'^^cos{27rk Zi/N) < to/4 

i=l 

if k = ko or k = N — k^, and 

m 

^ cos(27r/cZi/fV) > to/4 

i=l 

otherwise. 

Proof. Let / denote the function of Z defined by f{z) = cos{2nkz / N) , and let 
X = f{Z) denote the random variable defined by /. Then — 1 < X < 1 and the 
expected value of X is 

{ 1 if 2fc = 2ko = N 

i if either k = ko or k = N — ko 
0 otherwise. 

If k ^ ko and k ^ N — ko, then apply Hoeffding’s lemma on to independent 
random variables all having the same probability distribution as X. If fc = /cq or 
k = N—ko, then apply Hoeffding’s lemma on to independent random variables all 
having the same probability distribution as the random variable E[X] — X. □ 

We are not only concerned about testing for a specific 0 < A: < N/2 if 
k = ko or k = N — ko, but in testing every one of them. Fortunately, the proba- 
bility is diminutive, so we can reuse the same to samples in all N/2 tests, 

and still it is very likely that the sum Jf/lLi cos{2nkzi/N) is larger than to/ 4 if 
and only if k = ko or k = N — ko. 

Proof ( of Theorem 0). This is a simple consequence of Lemma 0 Let fcg = 
min{A:o,fV — ko}. The probability that Y^/f^cos{2TTkoZi/N) < to/ 4 is at most 
g-m/32 ^ Furthermore, for every integer 0 < A: < N/2 not equal to k'o, the 
probability that Y}h=i cos{2-Kkzi/N) > to/ 4 is also at most If k ^ k'^, then 
one of these [1V/2J events must have happened, and the probability for that is 
upper bounded by[^J-^< 2 ^. □ 
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4 Abelian Hidden Subgronps 

Theorem ^in Section Estates that the Abelian subgroup problem can be solved 
efficiently on a quantum computer. The algorithm which accomplishes this is 
most easily understood using some basic representation theory for finite Abelian 
groups which we now briefly review. For more details see the excellent refer- 
ences Ildll4l . For any Abelian group G the group algebra C[G] is the Hilbert 
space of all complex-valued functions on G equipped with the standard inner 
product. A character of G is a homomorphism from G to C. The set of charac- 
ters admits a natural group structure via pointwise multiplication and is a basis 
for the group algebra. The Fourier transform is the linear transformation from 
the point mass basis of the group algebra to the basis of characters. Finally, for 
any subgroup H ^ G, there exists a subgroup of the character group called the 
orthogonal subgroup F[^ which consists of all characters x such that x(h) = 1 
for all h G H. 

We now sketch the quantum algorithm for solving the Abelian hidden sub- 
group problem. In the interest of clarity we omit all normalization factors in our 
description. The state of the computer is initialized in the superposition 

\ 9 )\i{g))- 

geG 

We then observe the second register with outcome, say, q G R. This action serves 
to place the first register into a superposition of all elements that map to q 
under 7. Because 7 is constant and distinct on left cosets of H we may write the 
new state of the computer as 



Is + ^)l'?) 

heH 

for some coset s + H chosen by the observation of the second register. We then 
apply the quantum Fourier transform on the first register, producing the state 

where X*(s) denotes the complex conjugate of x(s). Finally, we observe the first 
register. Notice that this results in a uniformly random sample from 

It can easily be shown that by repeating this experiment of order log \H^\ 
times, we find a subset X C that generates with probability exponen- 
tially close to 1 . The hidden subgroup H ^ G can then be calculated efficiently 
from F[^ on a classical computer, essentially by linear algebra. In summary, 
the sole purpose of the quantum machine in the above algorithm is to sample 
uniformly from H^. It is known that an arbitrary good approximation to the 
quantum Fourier transform can be performed efficiently so, assuming the 
given function 7 can be computed in polynomial time, then so does the complete 
algorithm. 
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5 A Generalized H-^ 



We now briefly discuss the main ideas of harmonic analysis on groups, stating 
as facts the main results that we require. For more detailed information see for 
example usd!- Let G be a (possibly noncommutative) finite group. A repre- 
sentation of G is a homomorphism p : G ^ GL(V),) where Vp is called the 
representation spaee of the representation. The dimension of Vp, denoted dp, is 
called the dimension of the representation. 

The representation p is irreducible if the only invariant subspaces of Vp are 0 
and Vp itself. Two representations p\ and p 2 are equivalent if there exists an 
invertible linear map S : Vp^ — > Vp^ such that pi{g) = S~^ p 2 {g) S for all g G G. 

Let r = {pi, p 2 , . . . , Pr} be a complete set of inequivalent, irreducible repre- 
sentations of G. Then the identity X^i=i = 1*^1 holds. Furthermore, we may 
assume that the representations are unitary, i.e., that p{g) is a unitary matrix for 
all g G G and all p G P. The functions defined by pij = p{g)ij for 1 <i,j < dp 
are called matrix coefficients, and by the previous identity it follows that there 
are |G| matrix coefficients. It is a fundamental fact that the set of all normal- 
ized matrix coefficients obtained from any fixed P is an orthonormal basis of 
the group algebra C[G] . The Fourier transform with respect to a chosen P, is 
a change of basis transformation of the group algebra from the basis of point 
masses to the basis of matrix coefficients. 

If G is commutative, then these definitions reduce to those discussed in Sec- 
tion 0 since in that case, all representations are 1-dimensional and each ma- 
trix coefficient is just a character. If G is noncommutative, then there exists at 
least 1 irreducible representation of G with higher dimension, and in this case 
the Fourier transform depends on the choice of bases for the irreducible rep- 
resentations. It seems as though this is what complicates the extension of the 
quantum algorithm for commutative groups to the noncommutative scenario. 

It turns out that for our present application it is most useful to use an 
equivalent notion of the Fourier transform. One may also think of the matrix 
coefficients as collected together in matrices. In this view the Fourier transform 
is a matrix- valued function on P. For each / G C[G] , we define the value of the 
Fourier transform at an irreducible representation p G T to be 



f(p) 




f{g)p{g)- 

geG 



If we take individual entries of these matrices, then we recover the coefficients in 
the basis of matrix coefficients. There is a Fourier inversion formula and therefore 
/ is determined by the matrices {f{p)}p^r- 

We may now describe the noncommutative version of P[^. Let V^ be the 
elements of Vp that are pointwise fixed by P[, 

V^ = {v GVp\ p{h)v = V for all h G H}. 
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Let be the projection operator onto . Then define 

H^ = {Pp]p&r- 

The significance of this definition follows from the following elementary result. 

Theorem 8. Let Ih be the indieator funetion on the subgroup H ^ G. Then, 
for all p G r, we have that Ih{p) = ■ 

Corollary 9. Let sH be any coset of H ^ G. Then Theorem 0 immediately 
yields, for all p G T, we have Ish{p) = p{s)P^ . 

Let us briefly summarize the role of this result in the quantum algorithm. 
If we straight-forwardly apply the quantum algorithm described in the previ- 
ous section to the case where G is noncommutative, then we must determine 
the resulting probability amplitudes and the information gained by sampling 
according to these amplitudes. 

Recall that the state of the quantum system after the first observation is a 
superposition of states corresponding to the members of one coset. Thus the state 
may be described by the indicator function of a coset Lsh ■ The final observation 
results in observing the name of a matrix coefficient \p,i,j). The probability of 
observing \p,i,j) is given by p where Cp^ij is the coefficient of pij in the 
expansion of L^h in the basis of matrix coefficients. The corollary above allows 
us, in theory, to compute these probability amplitudes. 

The algorithm described in the first part of this paper may be derived from 
these general methods. For a general noncommutative group it seems that these 
methods are necessary for an analysis of the resulting probability amplitudes. 
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Appendix: Proof of Theorem 

Proof (of Theorem\^. The following commutative diagram illustrates our ap- 
proach: 

Hi^ ^ H ^ H/Hi 



Wjn X {0} >■ Zjv Z 2 = 7/ AT >■ Djq / H\ 

First, apply Theorem 0 to produce a subset X\ 'C Hi = H D x {0}) 
such that Xi generates Hi with probability at least 1 — 1/N by using OflogN) 
queries to 7. Let xi € Xi generate {Xi). 

The subgroup (xi) is normal in D^, and the quotient group Dm/{xi) is 
isomorphic to Dm with M = [Z^r x {0} : (a;i)]. Define 72 : Dm/{xi) ^ i? by 
72(3 + (2:1)) = l { g )- Then 72 has hidden subgroup H / (xi). 

Suppose (xi) = Hi. Then H/{xi) is either trivial or generated by a re- 
flection T 2 + (xi). Apply the algorithm in Theorem 0 with 72 a number of 
t = |"log(2A^)/ log(2M)] times, ensuring we find T 2 + (xi) with probability at 
least 1 — 1/2N, provided it exists. 

Finally, output xi, and output also the coset representative T2 G if it ex- 
ists. The overall success probability is at least (1 — 1/A^)(1 — 1/2A^) > 1 — 2/N. 
The total number of evaluations of 7 is at most OflogN) + t(891ogM -|- 7), as 
each evaluation of 72 requires just one evaluation of 7. □ 
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Abstract. In this paper, a simple technique which unifies the known approaches 
for proving lower bound results on the size of deterministic, nondeterministic, 
and randomized OBDDs and fcOBDDs is described. 

This technique is applied to establish a generic lower bound on the size of ran- 
domized OBDDs with bounded error for the so-called “fc-stable” functions which 
have been studied in the literature on read-once branching programs and OBDDs 
for a long time. It follows by our result that several standard functions are not 
contained in the analog of the class BPP for OBDDs. 

It is well-known that fc-stable functions are hard for deterministic read-once 
branching programs. Nevertheless, there is no generic lower bound on the size 
of randomized read-once branching programs for these functions as for OBDDs. 
This is proven by presenting a randomized read-once branching program of poly- 
nomial size, even with zero error, for a certain fc-stable function. As a conse- 
quence, we obtain that P ^ ZPP n NP n coNP for the analogs of these classes 
defined in terms of the size of read-once branching programs. 



1 Introduction 

Branching programs (BPs) are established as a standard model for the study of space- 
bounded computations. Basic defiiitions are given in the next section. 

OBDDs (ordered binary decision diagrams) are a restricted type of branching pro- 
grams which have been introduced by Bryant |f2i as a data structure for Boolean func- 
tions and turned out to be extremely useful in various fields of application. Jain, Bitner, 
Abadir, and Fussell Cl have extended OBDDs to /cIBDDs in order to have succinct 
representations for a larger class of functions. Roughly speaking, a fclBDD is a branch- 
ing program which can be decomposed into at most k layers such that each layer is 
an OBDD (possibly with a different variable ordering for each layer). A /cOBDD is a 
fcIBDD where the variable orderings have to be identical for all layers. 

Apart from practical issues, these restricted types of branching programs are also 
interesting as objects of theory. The first exponential lower bounds for OBDDs are due 
to Bryant 01. Jukna CS1> Krause lETl and Gergov im have proven exponential lower 
bounds even for fcOBDDs. Bollig, Sauerhoff, Sieling, and Wegener have shown that 
the classes of sequences of functions representable in polynomial size by fclBDDs and 
by fcOBDDs form a proper hierarchy with respect to fc. 

* This work has been supported by DFG grant We 1066/8-1. 
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Here we are concerned with the nondeterministic and prohabilistic modes of com- 
putation for branching programs. Although nondeterministic variants of branching pro- 
grams are well-known (even the hrst model of Shannon has in fact been nondetermin- 
istic), the probabilistic mode of computation has only recently gained attention. 

The complexity theoretical analysis of randomized OBDDs has been launched by 
Ablayev and Karpinski Q. They have presented a function which is representable by 
randomized OBDDs of polynomial size with small one-sided error, but which has expo- 
nential size for deterministic OBDDs and even deterministic fcOBDDs. Later on, they 
have extended the lower bound also to nondeterministic fcOBDDs f3|. Lower bounds 
for randomized OBDDs have been proven independently by Ablayev Q and the au- 
thor Il2?l . Using these results, the relations between the complexity classes P, NP, RP, 
and BPP defiled in terms of the size of OBDDs could be completely characterized 
(see mr Recently, Karpinski and Mubarakzjanov G3 have also resolved the rela- 
tion between the classes P and ZPP for OBDDs by using a similar result for one-way 
communication complexity O- They have shown that, surprisingly, these classes co- 
incide for OBDDs. A lower bound technique for the more general case of randomized 
read-once and randomized read-fc-times BPs has been described in El. 

A large part of the known lower bound proofs on the size of branching programs 
either explicitly or implicitly rely on results from communication complexity theory. 
These communication complexity theoretical proof techniques have been applied in dif- 
ferent disguises by several people. Hromkovic O has already described a “unifying” 
approach to the proof techniques for fcOBDDs (which can be generalized to the case 
of read-fc-times BPs) using a new measure for communication complexity, so-called 
‘b verlapping” communication complexity. 

In this paper, we show that all the known proofs of lower bounds on the size of 
deterministic, nondeterministic, and randomized variants of OBDDs and fcOBDDs boil 
down to rectangular reductions in the sense of communication complexity theory. Such 
reductions allow to apply known results on communication complexity to prove results 
on the size of OBDDs and fcOBDDs in a simple way. 

We apply this “reduction technique” to the class of so-called “fc-stable” functions. 
The defii ition of fc-stable functions goes back to Dunne the name itself has been 
introduced by Jukna ini. Several authors have observed that such a function has size 
2^ — 1 for deterministic read-once BPs. To put it intuitively, a fc-stable function is a 
function which has to compute a ‘{jointer function” as a subproblem. By a pointer 
function, we mean a function which first computes the index (address) of a variable 
from its input and then outputs the value of this variable as the result. We show here 
that an arbitrary fc-stable function has size for randomized OBDDs with bounded 
error. It immediately follows for many standard functions from the literature on read- 
once BPs that they are not contained in the analog of the class BPP for OBDDs. 

Since randomization does not seem to help very much for functions which have to 
output a single bit of its input as the result, and since fc-stable functions are known to be 
hard for deterministic read-once BPs, it is tempting to conjecture that they are also hard 
for randomized read-once BPs. Jukna, Razborov, Savicky, and Wegener have used 
a certain fc-stable function to show that the analogs of the classes P and NP n coNP for 
read-once BPs are distinct and have asked whether their function can even be shown to 
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have exponential size for randomized read-once BPs. Here we answer this question in 
the negative sense: It turns out that this function can even be computed by randomized 
read-once BPs of polynomial size with zero error. As a consequence of this result, we 
obtain that the classes P and ZPP n NP n coNP for read-once BPs are different. 

The rest of the paper is organized as follows. In Section 2, we introduce some basic 
notions concerning branching programs. In Section 3, we describe the general proof 
technique. After this, we apply this technique to prove the generic lower bound on the 
size of randomized OBDDs for fc-stable functions (Section 4). Section 5 is devoted to 
the upper bound result for the function of Jukna, Razborov, Savicky, and Wegener. 

2 Defhi tions 

We start with a review of basic definitions concerning branching programs and OBDDs. 

Definition 1. A branching program (BP) on the variable set {x \, . . . , Xn} is a directed 
acyclic graph with one source and two sinks, the latter labeled by the constants 0 and 1. 
Each non-sink node is labeled by a variable Xi and has two outgoing edges labeled by 
0 or 1. This graph represents a Boolean function f : {0, 1}" — *■ {0, 1} in the obvious 
way: Call an edge labeled by c C {0,1} leaving a node labeled by Xi activated by a if 
Qi = c. Then each input a G {0, 1}" corresponds to exactly one path of activated edges 
from the source to one of the sinks (called the computation path /or a). The value of the 
sink reached by this path determines f{a). The size of a branching program G is the 
number of its nodes and is denoted by IGj. 

A read-once branching program is a branching program where on each path from 
the source to a sink, each variable may appear at most once. 

An OBDD is a branching program with a variable ordering, given by a permutation 
7T on the set {1, . . . , n}. On each path from the source to the sinks, the variables at the 
nodes have to appear in the order prescribed by tt ( where some variables may be left 
out). A n-OBDD is an OBDD ordered with respect to tt. 

A fcOBDD is a branching program with a variable ordering tt whose set of nodes 
can be partitioned into k parts ( called layers ) C\, . . . , Ck such that ( i ) edges starting in 
Ci end in Cj with j > i; and (ii) all edges within each Ci fulfill the ordering restriction 
for OBDDs with respect to tt. 

Ablayev and Karpinski |2!| have introduced randomized OBDDs defined analogously 
to probabilistic circuits. In the following, we give a defiiition for randomized general 
branching programs. 

Definition 2. Let a branching program G with the following special properties be 
given: (i) G has three types of sinks, labeled by 0, 1 or “I”; (ii) G is defhed on two 
disjoint sets of variables X = ja;i, . . . , a;„} and Z = {zi, . . . , Zr}; and (Hi) on each 
path from the source to a sink, each variable from Z appears at most once. The vari- 
ables from Z are called probabilistic variables, and nodes labeled by such variables are 
called probabilistic nodes. 

By an obvious extension of the usual semantics for deterministic branching pro- 
grams (see above), G represents a function g : {0, 1}" x {0, 1}’’ — *■ {0, 1, ?}. 



On the Size of Randomized OBDDs and Read-Once Branching Programs 



491 



We say that G as a randomized branching program represents a function f : {0, 1}" — > 
{0, 1} with 

- unbounded error if for all X S {0,1}" it holds that Frz{g{x, z) = f{x)} > 1/2; 

- two-sided error (bounded error) at most e, where Q < e < 1/2, if for all x G {0, 1}" 
it holds that Pr^{(ji(a;, z) = f{x)} > 1 — e; 

- one-sided error at most e, where 0 < e < 1, if for aW a; G {0, 1}" it holds that 

Vr^{g{x, z) = 0} = 1, if f{x) = 0; 

Vr^{g{x, z) = 1} > 1 - e, iff{x) = 1; 

- zero error and failure probability at most £, 0 < e < 1, if for all x G {0,1}" it 
holds that 

Pr^{g{x, z) = 1} = 0 A Pr^{g{x, z) = 1} < e, iff(x) = 0; 

Pr^{g{x, z) = 0} = 0 A Pr^{5(a;, z) = ?} < e, if f{x) = 1. 

In these expressions, z is an assignment to the probabilistic variables which is chosen 
according to the uniform distribution from {0, 1}’’. 

Definitions for randomized variants of restricted branching programs are derived from 
this in a straightforward way by requiring that the non-probabilistic variables fulfil the 
restriction. Randomized OBDDs are thus randomized branching programs with a vari- 
able ordering tt on the non-probabilistic variables, and the tests of the non-probabilistic 
variables on each path from the source to a sink have to be consistent with tt (analo- 
gously for /cOBDDs). As for Turing machines, nondeterministic branching programs 
occur as a special case of randomized branching programs with one-sided error. 

For a type TZ G {BPl, OBDD, /cOBDD} of restricted branching programs (where 
‘B PI” stands for read-once BPs), we denote the classes of sequences of functions with 
polynomial size deterministic, nondeterministic, and randomized branching programs 
with zero, one-sided, bounded or unbounded error of the respective type by P-7?., NP-7?, 
ZPP-7?, RP-7?, BPP-7?, and PP-7?, resp. For a class C of sequences of functions, co-C 
denotes the class of sequences of functions (/„)„giH with (^/n)nGiH G C- 

In the following, we will also apply well-known concepts from communication 
complexity theory. For the defnition of the respective notions and a thorough intro- 
duction, we have to refer to the monographs of Hromkovic ll 1 2ll and Kushilevitz and 
Nisan 



3 The Reduction Technique 

In this section, we describe the known techniques for proving lower bounds on deter- 
ministic, nondeterministic, and randomized OBDDs and fcOBDDs in a unifed way. 
This general approach is called “reduction technique” here. 

To put it intuitively, the known proofs of lower bounds on the size of OBDDs are 
all based on the fact that a large amount of information has to be exchanged across a 
suitably chosen cut in the graph in order to evaluate the represented function. Results 
from communication complexity theory are then explicitly or implicitly used to get 
lower bounds on the necessary amount of information. A similar approach works for 
fcOBDDs. 
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Our goal is to clearly separate the communication complexity theoretical part of 
these proofs from the conclusions on the size of the OBDD or fcOBDD. We will directly 
handle the more general case of fcOBDDs. The following defti ition will be used to 
establish the connection between the size of fcOBDDs and communication complexity 
with respect to a fixed partition. 

Definitions. Let f: {0, 1}” ^ {0, 1} be a function defiled on the variable set X = 
{x\, . . . , Xn}- Let a variable ordering on X be given by tt: {1, . . . , n} ^ {1, . . . , n}. 
Let 1 < p < n - 1 and L := , x^(p)}, R := {a;^(p+i), • ■ • , x^(„)}. Defiie 

the function /': {0, 1}^ x {0, 1}"“P ^ {0, 1} on assignments x to L and y to R by 
f'{x, y) := f{x + y), where x + y denotes the joint assignment to X obtained from x 
and y. Then we call f the partitioned version of / with respect to tt and p and denote 
this function by f^'^. 

Since we usually cannot directly analyze the communication complexity of the function 
represented by a fcOBDD, it is important to be able to identify hard subproblems which 
can be handled by the available tools. As for Turing machines, we use a reduction to 
show that the whole function is at least as hard as the considered subproblem. 

Several notions of reducibility defiled analogously to the well-known notions for 
Turing machines have been introduced in communication complexity theory. The most 
common type is the rectangular reduction, which is the analog of many-one reducibility 
for Turing machines. 

Definition 4 (Rectangular reduction). Let Xf,Yf and Xg,Yg be finite sets. Let 
f : Xf X Yf —>■ {0, 1} and g: Xg x Yg {0, 1} be arbitrary functions. Then we 
call a pair {pi, P 2 ) of functions Lp\ \ Xf — > Xg and ip 2 ' Yf Yg a rectangular re- 
duction from / to (/ (or s/m/i/y “reduction” for short) if g(ipi(x), Lp 2 {y)) = f{x,y)for 
all (x,y) € Xf X Yf. If such a pair of functions exists for f and g, we say that f is 
reducible to g. 

Now we are ready to formally describe the connection between the size of fcOBDDs 
and communication complexity. 

Lemma 1. Let g : {0, 1}" ^ {0, 1} be defied on the variable set X = {x \, . . . , Xn}. 
Let TT be a variable ordering on X. Assume that there is a function f : U xV — > {0, 1}, 
where U and V are fhi te sets, and a parameter p with 1 < p < n — 1 such that f is 
reducible to the partitioned version g'^’f of g. Let G be a randomized kOBDD ordered 
according to it which represents g with two-sided error at most e. Then it holds that 

\\og\GW>Rf-\f)/{2k-l), 

where R^^~^{f) denotes the minimal number of bits exchanged by a randomized 
(2k — l)-round communication protocol for f with two-sided error at most e. Analo- 
gous assertions hold for deterministic, nondeterministic, and randomized OBDDs with 
zero, one-sided or unbounded error and the corresponding measures for (2k— l)-round 
communication complexity. 




On the Size of Randomized OBDDs and Read-Once Branching Programs 



493 



Proof. Since / is reducible to it follows that {g ^ (any r- 

round protocol for g'^'P can be used to define an r-round protocol for / with the same 
amount of communication). Hence, it is sufficient to show that {2k — l)[log |G|] > 
R^^~^{g^’P). To prove this inequality, we construct a randomized {2k — l)-round pro- 
tocol for g'^'P from G. The basic ideas behind this construction go back to Jukna IT3l 
and Krause ED- 

A cut in G is a set G of nodes with the property that each path from the source 
to the sinks runs through exactly one node in G. The set which only contains the 
source of G and the set containing the sinks are obviously cuts. Call these cuts Gq 
and C 2 k, resp. Furthermore, using the fact that G is a A:OBDD we can choose cuts 
Gi, . . . , G 2 fc-i in G such that the following holds: (i) The subgraph consisting of the 
paths between the nodes in C 2 i and € 21 + 2 , the ith layer of G, is a randomized tt-OBDD 
(with several sources and sinks), (ii) The cut G 2 i+i decomposes this tt-OBDD into an 
upper part of paths where the non-probabilistic variables are labeled by variables from 
L := ■ . ■ , XTr(p)}, and a lower part where the non-probabilistic variables are 

labeled by variables from R := {a:^(p_|_i), . . . , a;^(„)}. 

Now we are ready to sketch a randomized protocol P by which two players, called 
Alice and Bob, can evaluate g'^’P in {2k— 1) rounds. Player Alice obtains an assignment 
X to the variables in L and player Bob an assignment ytoRas inputs. Both use the graph 
G as an ‘bracle. ” Player Alice starts the communication. It is easy to see how Alice and 
Bob can jointly follow the computation path for the combined assignment a; -|- y in G 
by exchanging the numbers of nodes on the cuts Gi, . . . , C 2 k-i lying on such a path. 
When a player encounters a node labeled by a probabilistic variable in her (his) part of 
the graph, she (he) locally chooses a value for this variable at random and precedes to 
the corresponding successor. The last player (who is always Bob) outputs the value of 
the reached sink as the result of the protocol. 

Since each probabilistic variable can appear at most once on each computation path 
in G, both players can choose the values of the probabilistic variables independently. 
Because of the error guarantee of G, it follows that the above protocol P computes g^'P 
with error at most e. Furthermore, the number of exchanged bits of communication is 

at most x; [loglGill < (2A: - 1) [log |G|] . □ 

4 Lower Bounds for fc-Stable Functions 

Now we apply the lower bound technique presented in the last section to the class of 
A:-stable functions. 

Definition 5. Let k G {1, . . . ,n — 1}. A function f : {0, 1}" ^ {0, 1} deftied on the 
variable set X (|AT| = n), is called /c-stable if the following holds. For an arbitrary 
set Xi C X, I Ail I = k, and each variable x G Xi there is an assignment b to the 
variables in X\X\ such that either f{a + b) = a{x) for all assignments a to X\ or 
f{a + b) = ~^a{x) for all assignments a to Xi. 

It is a well-known fact that A:-stable functions have size at least 2^ — 1 for deterministic 
read-once BPs. Lower bounds of this type have been proven by several authors, e. g., by 
Dunne [0, Jukna Ea, Krause lEoi and Jukna, Razborov, Savicky, and Wegener o. 
We list some examples from these papers. 
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Examples. 

(1) Let N := (2) and 1 < fc < n. Define the function : {0, 1}'^ ^ {0, 1}, on the 
Boolean variables X := {xij)i<i^j<n- Let G{X) be the undirected graph on the 
nodes from {1, . . . , n} described by X, i. e., edge {i,j} exists in G{X) iff Xij = 1. 
Let cl„.fc(X) = 1 iff the graph G{X) contains a /c-clique. 

It holds that is s-stable for s := min{(2) — 1, (n — fc + 2)/2}. (This can 
be proven easily by using the ideas contained in ill 7ll and Jukna has proven a 
similar result for the directed version of the clique-function.) 

(2) By PMji, DETji : {0,1}"^ ^ {0, 1}, which are both functions defined on an nxn- 
matrix of Boolean variables X := {xij)\<ij<n as an input, denote the Boolean 
permanent and the Boolean determinant, resp. The functions PM„ and DET„ are 
both (n — l)-stable II20I . 

(3) Let n — + q + 1, where q — p"*, p is a prime and m an arbitrary natural 

number. Let P = {1, . . . , n} be the set of ‘jjoi nts” of a projective plane of order 
q and let Li, . . . , C P be the “lines.” A set 2l C P is called a blocking set if 
AnLi 7^ 0 for i = 1, . . . , n. Define P„ : {0, 1}" — > {0, 1} by Bn{x \, . . . , Xn) = 1 
iff {i I Xi = 1} is a blocking set of size at most g -f fc, where where fc := (g + 1) /2 
if g is prime, fc := \^/q\ otherwise. 

The proof of the lower bound on the size of deterministic read-once BPs for P„ 
from [Q shows that P„ is fc-stable. 

(4) Let n = 2\ and define m := \n/l\. Let \u \2 denote the value of a Boolean vector 
u interpreted as a binary number. Define A: {0,1}™ ^ {0,1} as follows. Chop 
the input vector from {0, 1}™ into s := [VtoJ blocks of size s each. Then A is the 
disjunction of the conjunctions of all variables in each of these blocks. 

Finally, define ADDR„ : {0, 1}" ^ {0, 1} by ADDR„(a;o, . . . , := Xa, 

a := |(A(a;'“i), . . ., A(a;°))|2, where P := {x^m, ■ ■ ■ ,X(^,+i)^_i),i = 0, . . . ,/-l. 
It is easy to verify that ADDR„ is (s — l)-stable (see lIT^ or O). 

The following technical lemma describes a large class of functions which are hard for 
randomized OBDDs with bounded error. We will show that the functions defined above 
belong to this class. 

Lemma 2. Let INDEX^ :UxV^{0, 1}, where U := {0, 1}™, V := {1, . . . ,m}, 
be defiled by INDEXm('u, v) := Uyfor u = (mi, . . . , Um) € U and v gV. 

Let g : {0, 1}" ^ {0, 1} be defined on the variable set X. Let fc with 1 < k < n — 1 
be fixed. Assume that for each variable ordering tt on X, there is a parameter p with 
1 < p < n — I such that INDEX^ is reducible to the partitioned version g'^’P of g. 
Let G be a randomized OBDDfor g with arbitrary two-sided error e, e < 1/2. Then it 
holds that |G| = 

Proof. Kremer, Nisan, and Ron H23 have shown that each randomized one-way proto- 
col which computes INDEX^ with two-sided error smaller than 1/8 needs I7(fc) bits 
of communication. If the error of the randomized OBDD for g is smaller than 1/8, then 
the lower bound of order immediately follows from LemmaQl 

To obtain the claimed lower bound for an arbitrary error probability e < 1/2, we 
use the fact that the error probability of randomized OBDDs can be decreased below an 
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arbitrary small constant while maintaining polynomial size by “probability amplifta- 
tion,” as shown in |01 and [E3H . □ 

The above lemma can be applied to all /c-stable functions, which yields the desired 
generic lower bound. 

Lemma 3. Let g: {0, 1}" — *■ {0, 1} be a k- stable function. Let G be a randomized 
OBDD for g with arbitrary two-sided error s < 1/2. Then it holds that |G| = 2^^^/ 

Proof. We are going to construct a rectangular reduction from INDEX^ to a suitable 
partitioned version of g. 

Let an arbitrary variable ordering tt on the variable set X of p be given. For the ease 
of notation, we assume here that tt maps indices to its respective variables, i. e., tt 
is a function of the form 7T : {1, ..., n} ^ X. Define L := {7 t(1), . . . , 7t(A:)} and 
R := {Tr{k + 1), . . . ,7r(n)}. We observe that, since / is A:-stable, for each variable 
X G L, there is an assignment to R such that either /(a + bx) = a(x) for all as- 
signments a to L or f(a + bx) = ~^a(x) for all assignments a to L. Let us fr st assume 
that always the former case occurs. In the following, we dehne a rectangular reduction 
from INDEXfe to g'^’^. 

The function : [/ — > {0, 1}^ is only a permutation of the bits of its input vector. 
For an arbitrary input u = (iti, . . . , Uk) G U = {0, 1}^, dehne the assignment a to the 
variables in L by a{x) := Ut^-i(^x) for x G L. Set := a. The function (p 2 - V ^ 

{0,1}"“^ is dehned by (fi 2 {v) := 67 t(«), where v G V = {l,...,/c}. For arbitrary 
{u,v) G U X E, we have g'^^^{ipi{u),ip 2 {v)) = INDEXfc(rt, r;), hence, ((^ 1 ,(^ 2 ) is a 
rectangular reduction from INDEX^ to g'^’^. 

We still have to handle the case that for some variables x G L, it holds that 
/(a + bx) = -'a(x) for all assignments a to L. For this case, we slightly extend our 
reduction concept. Additional to the transformation of the input hy the pair of func- 
tions ((pi,(f 2 ), we allow to negate the result for the ‘t arget problem,” g((pi(u), (p 2 (x)), 
dependently on the input v G V . More precisely, such a reduction consists of ifi, (^2 
and an additional function : V x {0,1} — > {0, 1} for which p((/Ji(u), (/J 2 (u))) = 
/(m, t;) for all {u, v) G U xV. 

It is easy to see that an analogous versions of Lemma Q] from the last section holds 
for this extended type of reductions. Here we choose v{v,c) = c for c G {0, 1} if 
/(a -I- 6 ,r(i;)) = for ^11 assignments a to L, and v{y, c) = ->c for c G {0, 1} if 

/(a -I- 6 ,r(i;)) = ~^a(Tr(v)) for all assignments a to L. One easily verifes that for this 
choice of u and (pi, <p 2 h holds that v{y,g{ipi{u),if 2 {v))) = INDEXfe(u,t)) for all 
{u,v) gU xV. □ 

From this lemma, we immediately obtain that the examples of fc-stable functions al- 
ready mentioned are all hard for randomized OBDDs with bounded error: 

Theorem 1. cl„ „/ 2 ,PM„,DET„,H„,ADDR„ ^ BPP-OBDD. 

Apart from these functions, there are some “pointer functions” in the sense of the infor- 
mal defnition from the introduction which cannot be A:-stable for large k because they 
are contained in the class P-BPl. For these functions, we can still apply Lemma|2l As 
examples we consider the following standard functions from the literature on OBDDs. 
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Definition 6. 

(1) The functionHWJin (“hidden weighted bit”) is defiled on x = (xi, ... ,Xn)- De- 
file sum(x) := X)r=i ^0 ■= 0- Then HWB„(a:) := Xsum(x)- 

(2) The function ISA„ (“indirect storage access”) is defhed on n = 2'' + r vari- 
ables xo, . . . ,X 2 <--i and yo, . . . ,yr-i- Let s := [2’'/rJ. Let ISA„(a:, y) = Xj, 
where j := |(a;„, • ■ ■ , a;(i+i)r_i )|2 and i := |(?/r_i, . . . , yo )|2 (if i > s, let 
lSA(x,y) = Oj. 

The functions HWB„ and ISA„ have been introduced by Bryant 0 and Breitbart, 
Hunt, and Rosenkrantz iS], resp., who have also shown that these functions have expo- 
nential size for deterministic OBDDs. Sieling and Wegener m have proven that both 
functions are contained in the class P-BPl . We complement this by the following result. 

Theorem 2. HWB„, ISA„ ^ BPP-OBDD. 

Sketch of Proof. This follows immediately by Lemma Eland the known lower bound 
proofs for these functions (from |0 and ||3, resp.). These proofs can be seen as rectan- 
gular reductions from the function INDEX to appropriate pardoned versions of ISA„ 
and HWB„, resp. (An explicit construction for the function ISA„ can be found in the 
ECCC report lE?l .l □ 

5 A fc-Stable Function with Small Randomized Read-Once BPs 

In this section, we show that there is no generic lower bound on the size of random- 
ized read-once BPs for fc-stable functions as it is the case for randomized OBDDs. 
We prove that the function ADDR„ from the paper of Jukna, Razborov, Savicky, and 
Wegener JEl can even be computed by a randomized read-once BP with zero error. 

For notational convenience, we consider the function ADDR„ only for input sizes 
where we can do without Ibors or ceilings. 

Theorem 3. Let n = 2* and I = 2*. The function ADDR„ can be represented by a 
randomized read-once BP of polynomial size with zero error and failure probability at 
most 1/2. 

Proof. Defii e m := n/l = 2*“/ Let A, a;* (z = 0, . . . , Z — 1), and Xa be as in the 
defii ition of ADDR„. Call the bits A(x'^), . . . , A(x‘~^) ‘kd dress bits” and the bit Xa 
‘but put bit.” Imagine the input variables of ADDR„ to be arranged as an Z x m-matrix 
with rows . . . , x‘~^. The algorithm implemented by the randomized read-once BP 
for ADDR„ will consist of two phases. In the first phase, we read some rows of the 
input matrix and compute the respective address bits. After that, only a small set A of 
possible output bits will be left. The second phase consists of evaluating all remaining 
address bits and “storing” the values of all variables in A in the branching program. 
Finally, we have determined the complete address. With probability at least 1/2, the 
addressed bit will belong to the stored values. 

By z; = . . . , vq) G {0, 1, we describe the address bits computed so far in 

the algorithm, let Vi = * if the zth bit is not yet known. The bits vq, . . . , called 

‘fco lumn address bits” in the following, determine the column where the output bit is 
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found. Likewise, the ‘to w address bits” Vi_j, . . . , vi-i determine the row of the output 
bit. 

For an arbitrary vector v, let C{v) C {0 , . . . ,m — 1} be the set of columns which 
are addressed by vectors v' which are obtained from v by assigning constant values to 
the *-bits, and let R{v) C {0 , . . . ,l — 1} the set of rows addressed in this way. Define 
A{v) := {im + j | z G R{v),j G (^(w)} as the set of indices of addressed output bits. 
Now we describe our randomized algorithm for the computation of ADDR„. 

Algorithm 1. 

(0) Initialize v: For z = 0, . . . , ? — 1, let := *. 

(1) Choose z G {0, 1} uniformly at random. 

(2) Case z = 0: 

Phase 1: For i G {I — I, ... ,l — 1}, read x* and compute Vi := A(x*). Let r := 
|(z;/_i, . . . , Vi_i )\2 G {0 , . . . ,l — 1}, i. e., r is the row within which the output bit 
lies. Then R{v) = {r}. If r > ? — we have “lost” and output 
Now assume that r G {0 , . . . ,l — I — 1}. For z G {0 , . . . ,l — I — !}\{r} read the 
row x* and compute Vi = A(x*). After this we have also determined all bits of the 
column address except one. Hence, |C(z;)| = 2 and thus also |A(z;)| = 2. 

Phase 2: As the hnal step, we evaluate the last missing address bit Vr = A(a:’’). 
While we compute Vr, we store the values of the two variables Xj with j G A{v) 
(these variables lie within row r). Afterwards, we know the complete address of the 
output bit, a = |(z;/-i, ■ • ■ , uo)| 2 - Since we have stored both possible output bits, 
we can output the correct value. 

(3) Case z = 1: 

Phase 1: For z G — 1} read the row x* of the input matrix and compute 

Vi := A(x*). After this, we have C{v) = {c}, where c = . . . , xo)| 2 , and 

hence, A{v) = {im + c\ l — — 1}. Notice that |A(z;)| = I — log log n. 

Phase 2: Now read all remaining rows x* with z G but again store 

all values of variables Xj with j G A{v) (i. e., the variables in column c). Finally, 
we know the complete address a = . . . , xo )|2 of the output bit. If it holds 

that [a/zTzJ < ( — ( — 1, i. e., the row where the output bit is found has already been 
read in Phase 1, output ”. Otherwise, we can output the stored value of Xa- 

It is easy to verify that the above algorithm in fact has zero error and outputs with 
probability at most 1 /2. The algorithm can be coded into a randomized read-once BP by 
the standard construction techniques for branching programs. We have ensured already 
in the description of the algorithm that each variable is only read once. For the evalu- 
ation of the bits Vi, we use polynomial size branching programs for A as sub-modules. 
We can at any time store the parts of the vector v computed so far since the whole vec- 
tor only has length 1. The second phases can be represented in polynomial size since 
always | A(x) | < log log n and hence, we need only to enlarge the width of the branch- 
ing program by a logarithmic factor in order to store all the needed values. □ 

We have thus obtained an exponential gap between ‘Las Vegas” and deterministic al- 
gorithms for the read-once BP model. Together with the result of Jukna, Razborov, 
Savicky, and Wegener that ADDR„ is representable in polynomial size by nondeter- 
ministic and co-nondeterministic read-once BPs, we obtain: 
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Corollary 1. P-BPl C ZPP-BPl n NP-BPl n coNP-BPl. 
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Abstract. We uncover a new class of attacks that can potentially affect 
any cryptographic protocol. The attack is performed by an adversary 
that at some point has access to the physical memory of a participant, 
including all its previous states. 

In order to protect protocols from such attacks, we introduce a cryp- 
tographic primitive that we call erasable memory. Using this primitive, 
it is possible to implement the essential cryptographic action of forget- 
ting a secret. We show how to use a small erasable memory in order to 
transform a large non-erasable memory into a large and erasable mem- 
ory. In practice, this shows how to turn any type of storage device into a 
storage device that can selectively forget. Moreover, the transformation 
can be performed using the minimal assumption of the existence of any 
one-way function, and can be implemented using any block cipher, in 
which case it is quite efficient. We conclude by suggesting some concrete 
implementations of small amounts of erasable memory. 



1 Introduction 

All of cryptography is based on the assumption that some information can be 
kept private, accessible only by certain parties. At some level, physical control 
over the storage device containing the information must be maintained. While 
it is reasonable to assume that such control can be maintained for some finite 
amount of time, as the duration of a protocol, it is not realistic to assume it 
continues indefinitely. Thus, it is important to be able to forget information. 

Practical considerations. Current computer systems treat erasing memory 
only from the point of view of efficiency, not security. Erasing a memory location 
enables the system to use that location for other purposes; it does not guarantee 
that the information is destroyed. In fact, in many practical systems the old 
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value might still be retrievable, making the assumption that data is forgotten, 
if not specifically remembered, a fallacy. For example, it has been observed that 
if a message is stored for a long time in some types of ram, the chip will slowly 
‘learn’ this value. Overwriting the data or powering down the chip will not erase 
it. When the chip is powered up again there is a good chance that the data will 
again appear in the ram 0 . Some of these effects are due to aging effects in 
the silicon that depend on the bits stored in the ram. Another example is the 
case of magnetic media. Here, the problem is more severe, since these devices 
are notoriously hard to wipe. It has long been known that simply overwriting 
the data is not sufficient; US government specifications call for overwriting the 
data three times for non-classified information, in order to minimize the chances 
of their retrieval by an adversary. However, the magnetic fields used to store 
the data have a tendency to migrate away from the read/ write head and bury 
themselves deep down in the magnetic material, so even if these fields are no 
longer readable by the original read/write head, other techniques might well 
retrieve them. The problem gets even more complicated if the operating system 
supports virtual memory. A typical virtual memory system will write a piece of 
RAM to disk in a swap-file without telling the program using the ram. When the 
program tries to access this RAM, the operating system reads the data from the 
swap file back into ram for the program to use. Suppose the program overwrites 
the ram in an attempt to erase the data and then exits. The old copy in the 
swap file is not erased by the operating system, but just marked as ‘available’. 
A direct inspection of the swap file will reveal the old values of the ram. 

Cryptographic protocols. In the security and cryptography literature, many 
protocols make the silent assumption that we can ‘forget’ information. For exam- 
ple, the randomly chosen ‘temporary secret key’ in DSS HS| and similar schemes 
jm should be forgotten by the signer after it has been used. If it can be 
found by an attacker, he will be able to reconstruct the secret key of the signer 
from the signature and the temporary secret key. More generally, participants to 
cryptographic protocols should forget all partial results, randomness values and 
temporarily stored information they use while executing such protocols. Other- 
wise this information could later fall in the hands of some adversary, who will 
be able to use it to attack the scheme. Clearly, this is an issue that can po- 
tentially arise in just any cryptographic protocol (apart possibly from very few 
exceptions). Known cryptographic techniques, such as ‘proactivization’ (e.g., 
or ‘forward security’ (e.g., jHI) do not avoid the problem. The for- 
mer technique maintains secrecy with respect to intruders that occassionally gain 
access to some of the storage devices, by changing the way secrets are shared 
between such devices. This, however, is of no help if the old shares cannot be 
forgotten. The latter techniques guarantee security with respect to adversaries 
that obtain the current state of the memory, but still knows nothing about the 
previous states (which we consider here). 

Our results. We investigate methods for storing data in such a way that infor- 
mation can be securely forgotten. To this purpose, we pt forward the new notion 
of erasable memory., a memory which allows to definitely and reliably erase val- 
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ues, by keeping only the most recently stored ones. We then formalize the type 
of attack that an adversary can mount on any cryptographic protocol which 
involves storage of (secret) values on any non-erasable memory. This leads to 
the formal definition of a secure erasable memory implementation as a method 
using a small piece of erasable memory in order to transform a large piece of 
non-erasable memory into large and erasable memory. Then, our main result 
consists in exhibiting such a secure erasable memory implementation under the 
assumption of existence of any pseudo-random permutation, which is known to 
exist under the existence of any one-way function (using II lit) 1141 1. We can show 
that such assumption is minimal too. In practice, pseudo-random permutations 
can be implemented using a block cipher such as some composed version of DES 
P (e.g., Triple-DES). In this case, our method also seems practical and efficient: 
the amount of erasable memory needed is only the amount to store the key for 
and compute one Triple-DES application; the storage overhead of our solution is 
only linear, and the computational overhead per memory access is a logarithmic 
number (in the size of the memory) of encryptions (this can be amortized to 
constant in some typical cases). While one would probably not want to use our 
method for all data, it would be quite reasonable for the small amount of data 
that is important for security purposes. 



Outline of the paper: We introduce our terminology and model in Sectional 
we present our construction of large erasable memory in Section 0 and discuss 
three possible concrete realization of small erasable memory in Section 0 



2 Definitions and Model 

In this section we recall the notion of pseudo-random permutations j!tll 4) . we 
introduce the model for secure erasable memory implementation and briefly 
discuss a first (inefficient) solution to our problem. 

2.1 Pseudo-random Permutations. 

Pseudo-random functions have been introduced in 0. In this paper we will 
use the formalization of finite pseudorandom functions given in [2| (a concrete 
version of the formalization in [PI 1 4j V 

Let V be the family of all permutations, and let G = {E{k, •), D{k, -)}fc C P be a 
family of permutations, where E(k, •) denotes a finite permutation and D(k, •) its 
inverse. Each element from G is specified by a K-hit key k; therefore, uniformly 
choosing an element from G is equivalent to uniformly choosing k G {0,1}^ 
and returning (E(k, •), E(k, •)). Let A be an algorithm with oracle access to an 
element from G. We let E[A*^] = Pr[(E(k, ■), D(k, ■)) G : ) = 1], i.e., 

the probability that A outputs 1 when ^’s oracle is selected to be a random 
permutation from G. If G, G' are two families of finite permutations, we let 
Advyi(G, G') = — E[A*^ ] denote the advantage of A in distinguishing G 

from G'. Here, following |3|, we are considering the following game, or statistical 
test. Algorithm A is given as oracle a permutation g chosen at random from 
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either G or G', the choice being made randomly according to a bit b. Then A 
will try to predict b. The advantage is Pr[A® — b] — 1/2, i.e., the amount that 
the probability that A’s output is correct minus the probability that a random 
guess for b is correct. We say that the family of permutations G is {t,q,e)~ 
pseudorandom if there is no A which, running in time t and making q oracle 
queries, is able to obtain that Adv^(G,7^) > e. 

2.2 Erasable Memory Implementation. 

By the term memory we will denote any type of storage device. We will then 
consider two main types of memory: persistent memory and erasable memory. 
Persistent memory doesn’t reliably forget former values (i.e., it allows the re- 
trieval of formerly stored values). In fact, in order to prove the strongest result, 
we make the pessimistic assumption that all values that were ever written to 
the persistent memory can be retrieved by an attacker. In contrast, erasable 
memory reliably forgets old values (i.e., at any location, only the most recently 
stored value can be retrieved). Due to the practical considerations previously 
discussed, we should think of a typical computer storage device as being entirely 
persistent and of the erasable memory as a different and small piece of mem- 
ory having (typically) some kind of physical implementation (we discuss three 
possible ones later, in Section 0. 

We will consider an erasable memory implementation as a probabilistic data 
structure algorithm that translates read and write operations for a logical array, 
called the virtual memory, into read and write operations to a physically imple- 
mented array, called the physical memory. At the beginning of the system, the 
physical memory is preprocessed into two parts: the erasable memory and the 
persistent memory, where we should think of the erasable memory as being much 
smaller than the persistent one. The goal of the erasable memory implementation 
is to transform the physical memory into a single erasable memory. 

Informally speaking, an attack on an erasable memory implementation pro- 
ceeds in the following way. An adversary picks two sequences of read and write 
operations which are not trivially distinguishable. The data structure is then 
simulated on both sequences, computing the final state of the erasable memory 
and the entire history of writes to the persistent memory. Then the two pairs of 
histories and final erasable memory states are sent to the adversary in a random 
order. The adversary tries to predict which physical memory history corresponds 
to which sequence of operations. The implementation is considered secure if no 
adversary can succeed significantly better than by a random guess. 

Definition 1. Let m,e,p be integers, and let M,EM be arrays denoting, re- 
spectively, the memory and the erasable memory, and such that \M\ = m, 
\EM\ = e. Let PM be an array denoting the persistent memory, and made 
of p lists pmi, . . . ,pmp, such that pmi = (r’ip, . . . , Vi,si) denotes the sequence of 
values ever written into location i of the persistent memory (i.e., Vi^i is the less 
recent and Vi^s, the most recent, where Si is the number of values ever stored 
into location i). Let OP = {Read, Write} be the set of memory operations. An 
instance of a memory operation op S OP is the tuple ins = (op; i; inpi , . . . , inpc, 
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outi, . . . , outd), where argument i points to the input location, arguments inpj 
are additional inputs for op and arguments outj are outputs of op when executed 
on input location i and additional inputs inpi, . . . , inpc- We will denote as valid 
all instances of the form 

1. (Read; i; iJM, PM; cont); 

2. (Write; i; PM, PM, cont; PM, PM); 

where i € {1, ■ ■ ■ ,p}, cont € S, for some alphabet S. Here, the read operation 
transfers into cont the most recent value Vi^st previously stored at location i 
of array PM; the operation possibly uses EM, and returns cont. The write 
operation, possibly using array EM , inserts cont into the last position of the list 
associated with location i in array PM . Namely, it increments Si by 1 and sets 
Vi, Si = cont. We say that two sequences of length T of valid instances of memory 
operations are T -equivalent if they contain the same subsequence of pairs {op, i), 
for op = Write (but possibly different additional inputs or outputs, e.g. cont, 
or different subsequences with Read operations) Q 

We are now ready to define a secure erasable memory implementation. 

Definition 2. An erasable memory implementation (EMI) is a pair of proba- 
bilistic algorithms EMI=(Preprocess, Update). On input a memory array M, 
algorithm Preprocess returns an erasable memory array EM and a persistent 
memory array PM . On input arrays EM, PM, and an instance of a memory 
operation ins, algorithm Update checks if the operation ins is valid; if so, it 
returns updated arrays EM, PM ; otherwise it returns the unchanged input ar- 
rays EM, PM. Now, let A be an algorithm and let Distinguisha(M, OP) be 
the following probabilistic experiment: 

{ (PM, PM) ^ Preprocess(M); 

((mso,i, • ■ • ,inso,T), (msi,i, . . . , insi^r)) ^ A{EM,PM); 

EMo ^ EM; PMo ^ PM; EMi ^ EM; PMi ^ PM; 

(PMo,PMo) ^ UPDATE(mso,i, PMo, PMo), for i = 1, . . . , T; 

(EMi, PMi) ^ UPDATE(msi,i, PMi, PMi), for i = 1, . . . , T; 
b ^ {0, 1}; d ^ A{EMb, PMb, EMi-b, PMi-b) ■ 

if {{inso.i, . . . , insom), {insi,i, . . . , ins\,T)) are P-equivalent and b = d then 
return: 1 else return: 0; } 

We say that the erasable memory implementation is {T,e)-seeure if for any 
adversary A making T memory operations, the probability that the experiment 
Distinguish^(M, OP) returns 1 is at most 1/2-1- e. 

We note that in the above definition we are asking the adversary to perform only 
T-equivalent sequences of valid virtual memory operations. The reason for this 
is to avoid the adversary to have trivial ways of distinguishing the two sequences 
(e.g., as for two sequences writing to different memory locations). 

^ We note that we can also handle different definitions of T-equivalent sequences, in 
different scenarios. 
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Related notions. The notion of crypto-paging CMOl was introduced to study 
the problem of hiding information from secondary storage devices, and can be 
used give a preliminary solution to our problem (see Section ESJ- A problem 
similar to ours is considered in 0, where a method similar to crypto-paging 
is suggested, but without a formal model or proofs. In m the authors study 
software protection in a model where one wants to protect any information about 
the communication between CPU and memory, including any access patterns. 
In Util the authors study the problem of storing information in memory in such 
a way that the location of the stored item is hidden to the observer of the 
communication. The fact that in our model an adversary is allowed at some 
time to look at the content of both the erasable memory and the memory (i.e., 
reading private keys and opening any encryption) makes these last two models 
uncomparable to ours. 

2.3 An Inefficient Solution 

A simple (but inefficient) solution to the problem of constructing a secure erasable 
memory implementation can be obtained as an immediate adaptation of the 
crypto-paging concept as we now show. The preprocessing operation con- 
sists in uniformly choosing a key k for a pseudo-random permutation, and en- 
crypting all the data using k; then the encrypted data is stored in the persistent 
memory, and the key k is stored in the erasable memory. In order to read a value, 
using the current key k in the erasable memory, the encryption of the value is de- 
crypted and the data is recovered. In order to write a value, a new key k' for the 
pseudo-random permutation is uniformly chosen, the value is encrypted under 
key k' and the encryption is stored in the persistent memory. In order to guaran- 
tee security, the new key k' needs to be stored into the erasable memory, replacing 
the old key k; moreover, all items in the memory need to be re-encrypted using 
the new key k' . Notice that the number of encryptions/decryptions at each mem- 
ory operation is quite large {linear in the size of the memory); this motivates 
the search for more efficient solutions. 

3 Our Method for Obtaining Large Erasable Memory 

In this section we present our construction for obtaining a secure erasable mem- 
ory implementation. The main cryptographic tool we use is a family of pseudo- 
random permutations {E{k, •), D{k, •)}fe (which, in practice, can be implemented 
by any block cipher). Formally, we achieve the following: 

Theorem 3. Let m be the size of the memory. Given a family of pseudo-random 
permutations F= {E{k, ■), D{k, -)}k, there exists a secure erasable memory im- 
plementation EMI=(Preprocess, Update), such that, for any integer I > 1, 
the following holds: 

1. Security. If F is (T, g, e)-pseudorandom, then EMI is (T', e')-secure, for 
T' = T — 0{lT\ogi m), and e' = e/ (Tlog; m); 



506 



Giovanni Di Crescenzo et al. 



2. Space Complexity. If F has block size a and key size K, and is computable 
in space s, then EMI uses a persistent memory of size 0{m) and an erasable 
memory of size 0{a + s + K)] 

3. Time Complexity. Let T> be the distribution of the location accessed to 
the memory; if E(k, ■), D(k, ■) can be computed in time t then, in order to 
process T memory operations, EMI takes time 0(Ttllogi m) for an arbitrary 
T> or time 0{t{T + Hog; m)) if T> returns consecutive values. 

We observe that the size of the persistent memory can be any polynomial in 
the size of the erasable memory input to EMI. Moreover, our erasable memory 
implementation requires only a logarithmic (in the size of the memory) num- 
ber of encryption/decryption operations in the worst case distribution over the 
possible memory operations, and it achieves an amortized constant number of 
encryption/decryption operations in the case the memory operations are made 
on consecutive locations. In the following, we first present an overview of our 
construction and then a formal description. Further issues on the efficiency of 
our scheme, discussions about dividing the write operation into a write and an 
erase stage, and a proof that our construction satisfies Theorem El are in jSj. 

An overview of our construction. Let G = {E{k, •), D{k, •)}fe denote a fam- 
ily of pseudo-random permutations (or, secure block ciphers). We arrange the 
persistent memory in a complete /-ary tree H, for I > 2. The root of the tree is 
contained in the erasable memory, the internal nodes correspond to the persis- 
tent memory addresses, and the leaves contain the data from the virtual memory. 
Therefore, we need more persistent memory than virtual memory, mainly for the 
interior nodes, whose number is that of the virtual locations divided by / — 1 
(thus, increasing / decreases the memory overhead). At each interior node x, 
there is an associated key kx and the list of values ever stored at this node’s 
location. The key associated to a leaf is equal to the content of the correspond- 
ing location in the persistent memory. At each physical location x, we store 
E{kp(^x)j kx ° j) where x is the j’th child of its parent p{x). To perform either a 
read or a write operation on a certain position of the persistent memory, we need 
to access the content of the corresponding leaf in the tree; therefore we follow 
its path starting from the root, and decrypting each physical location’s contents 
with its parent’s key to get its key. Then, in the case of a read operation, we 
just return the most recently stored value at that leaf. In the case of a write 
operation, we first insert the value to be written into the list associated to that 
leaf. Then, we follow the same path now starting from the leaf, and we pick new 
keys for each node along the path. For each such node, we decrypt its children’s 
most recently stored value and re-encrypt their keys with the parent’s new key. 
This encryption is then inserted into the list associated to that node. We note 
that before and while computing the encryption/decryption function, we must 
transfer to the erasable memory all input values, keys and intermediate values 
used. However, we can do the above so that only a constant number constant 
number of keys are in the erasable memory at any one time. 

A formal description of our construction. Our formal description considers 
the most general case of arbitrary distribution over the memory operations (the 
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case of consecutive locations can be directly derived from it and is omitted). In 
order to make the description cleaner, we assume that the intermediate values 
in the computation of algorithms E and D which need to be stored into the 
memory will always be stored into the erasable memory. By p{x) and Cj{x) we 
denote the parent of node a;, and the j-th child of node x, respectively. 

The algorithm Preprocess 

Input: an array M[1 . . . m]. 

Instructions: 

1. Initialize empty arrays PM[1 . . .p] and EM[1 . . . e]. 

2. Arrange PM as a complete 1-ary tree H of height [log; m], where 2 < I < 2““^. 

3. Denote by h the i-th leaf of H and let ki^ = M[i], for i = 1, ... ,m. 

4. For each node x of H, 

if X is not a leaf then nniformly choose a key £ {0, 1}^; 
let j be snch that x is the j-th child of node p{x); 

store kx ° j into PM[2] and kp^x) into PM[3] and set Zx = Eikp^x), kx ° j); 
if X is the root then store kx into EM[1]; 
else insert Zx into list prm, 

where i is the location in PM associated with node x\ 

5. return: {EM, PM) and halt. 

The algorithm Update 

Input: 

1. an instance ins of an operation op £ OP = {Read, Write}; 

2. arrays PM[1 . . .p] (storing tree H) and EM[1 . . . e]. 

Instructions: 

1. If ins is not valid then return: _L and halt. 

2. Let q be the path in tree H starting from the root and 

finishing at node associated with location loc to be read or written. 

3. For all nodes x in path q (in descending order), 

if p{x) is the root of H then set kp^x) = EM[1\\ 
else if X is not the root then 

store Zx into PM[2] and kp^x) into PM[3] and set yx = D{kp(x), Zx)', 
let kx £ {0, 1}^ and j £ {I, ..., 1} he such that yx = kx ° j’, 

4. If ins = (Read; loc, EM, PM-, cont) then 

set cont = kx, return: cont and halt. 

5. If ins = (Write; loc, EM, PM, cont-, EM, PM) then 

for all nodes x in path q (in ascending order), 
if X is not a leaf then 

uniformly choose k'x £ {0, 1}^; 
for J = 1, . . . , I, 

store Zc,j(x) into EM[2] and kx into EM[3] and set ycj(x) = D{kx, z^.i^x))', 
let k^.(x) e {0, 1}^ and j £ {l,...,l} be s.t. yc^(^x) = K^(x) ° j; 
set Zcj(^x) P{kx^ycj(x))} 
if X is the root then store kx into EM[1]; 
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else insert into list prrii, 

where i is the location in PM associated with node x\ 
set kj: = k'^\ 

return: {EM, PM) and halt. 



4 Concrete Realizations of Small Erasable Memory 

We describe different concrete ways of constructing physically erasable memory. 
All of these ways seem prohibitive for large amounts of memory, but much more 
reasonable for the small amount such as what we require. Discussions about 
comparing and combining the above methods can be found in 0. 

Trusted erasable memory. The simplest example of an implementation for an 
erasable memory is to use a trusted or guardable piece of memory. In this solution, 
the erasable memory is not so much a construction as a trust relationship. The 
system trusts the erasable memory not to reveal any information, which is at 
least as strong as forgetting former information. In a network system, the erasable 
memory might be a separate computer in a locked room. Due to the better 
physical security of this machine, it is less likely to be corrupted, and thus more 
likely to forget, in the sense that the information is not revealed to other parties. 
In the example of a multi-user personal computer (e.g., a device used by many 
people one at a time), each user might maintain physical control of one piece 
of memory, say a smart card or disk, and use that piece of memory as their 
personal erasable memory. 

Disposable erasable memory. Another construction for erasable memory is 
based on limited physical destruction. More specifically, we can associate the 
physical erasable memory with a disposable memory device, such as an inex- 
pensive smart card or disk. Each time we need to forget a value, a new device 
is obtained, and the old one physically destroyed, e.g., burnt or melted. (This 
frequency can be limited by a combination of methods, e.g., that of a temporarily 
trusted memory that is replaced at regular intervals.) 

Randomly updated erasable memory. The main problem with erasing mem- 
ory in physical sources is that imprints are left if the memory has the same value 
for a long period. Updating with the same value will not prevent this. However, 
it is reasonable to assume that updating a memory device frequently with un- 
correlated random values will not leave traces. Heuristically, our reasoning is as 
follows: The memory device has a maximum amount of retrievable data that can 
be stored on it. If there is no distinction between old and new data in terms of 
time the data has been stored, or correlations with other data, then it seems rea- 
sonable to assume that the newer data will be the easiest to recover. If at most 
a finite amount of data is possible to recover, and newer data is easier to recover 
than older data, it follows that sufficiently old data is impossible to recover. We 
can use this to make a fixed item erasable even if held for a long period of time. 
We distribute the item m by sharing it over two or more memory devices. At any 
time, one device holds a random string r, and the other to 0 r. The value of r 
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is updated regularly, and much more frequently than m, by 0’ing both memory 
devices with the same random string. (Note that the random strings do not have 
to be kept secret, so they could probably be generated pseudo-randomly even if 
the seed becomes known later. The randomness is intended to ‘fool’ the memory 
device, not the adversary.) This is reminiscent of the zero-sharing method in 
proactive secret sharing schemes 
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Abstract. We study a logic called FLC (Fixpoint Logic with Chop) 
that extends the modal mu-calculus by a chop-operator and termination 
formulae. For this purpose formulae are interpreted by predicate trans- 
formers instead of predicates. We show that any context-free process can 
be characterized by an FLC-formula up to bisimulation or simulation. 
Moreover, we establish the following results: FLC is strictly more expres- 
sive than the modal mu-calculus; it is decidable for hnite-state processes 
but undecidable for context-free processes; satisfiability and validity are 
undecidable; FLC does not have the finite-model property. 



1 Introduction 

Imperative programming languages typically offer a sequential composition op- 
erator which allows the straightforward specification of behavior proceeding in 
successive phases. Similar operators are provided by interval temporal logics, 
where they are called c/iop-operators. Important examples are Moszkowski’s In- 
terval Temporal Logic ITL uni and the Duration Calculus DC mi. As far as we 
know, however, no point-based temporal logic and, in particular, no branching- 
time logic with a chop operator has been proposed up to now. Indeed, at first 
glance there seems to be no natural way for explaining the meaning of sequen- 
tially composed formulae 4 >i ; 4>2 in the setting of point-based temporal or modal 
logic, as there is no natural notion of where interpretation of 4 >i stops and inter- 
pretation of 4>2 starts. 

In this paper we present a logic called FLC (Fixpoint Logic with Chop) 
that extends the modal mu-calculus jEj, a popular point-based branching-time 
fixpoint logic, by a chop operator ; and termination formulae term. For this 
purpose we utilize a ‘second-order’ interpretation of formulae. While (closed) 
formulae of usual temporal logics are interpreted by sets of states, i.e. repre- 
sent predicates, we interpret formulae by mappings from states to states, i.e. by 
predicate transformers. A similar idea has been used by Burkart and Steffen jEj 
in a model checking procedure for modal mu-calculus formulae and context-free 
processes. However, while we use a second-order interpretation of formulae, they 
rely on a second-order interpretation of states as property transformers. 

It turns out that FLC is strictly more expressive than the modal mu-calculus 
but is still decidable for finite-state processes. Consequently, FLC-based model 
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checking is, in our opinion, an interesting alternative to modal mu-calculus based 
model checking as it enables to verify non-regular properties. The chop-operator 
also enables a straightforward specification of phased behavior. Other results 
shown in this paper are that FLC is undecidable for context-free processes, that 
satisfiability (and thus also validity) is undecidable, and that the logic does not 
have the finite model property. These results are inferred from the existence of 
formulae characterizing context-free processes up to bisimulation and simulation. 

The remainder of this paper is structured as follows. The next section recalls 
bisimulation, simulation and context-free processes. In Section 0 we introduce 
the logic FLC and show that it conservatively extends the modal mu-calculus. 
Section 0 shows that context-free processes can be characterized up to bisim- 
ulation and simulation by single FLC-formulae. These facts, besides being of 
interest in their own, provide the main means for establishing the results on 
expressiveness and decidability, which are presented in Section 0 The paper 
finishes with a discussion of the practical utility of FLC. 

2 Preliminaries 

Processes, Bisimulation, and Simulation. A commonly used basic operational 
model of processes is that of rooted labeled transition systems. Assume for the 
remainder of this paper given a finite set Act of actions. Then a labeled transition 
system (over Act) is a structure T = {S, Act, — >), where S' is a set of states, and 
— > C S X Act X S is a transition relation. We write s s' for {s,a,s') G — 
A process is a pair P = (T, sq) consisting of a labeled transition system T = 
(S, Act, and an initial state (or root) sq G S. A process is called finite-state 
if the underlying state set S is finite. 

Transition systems provide a rather fine-grained model of processes. There- 
fore, various equivalences and preorders have been studied in the literature that 
identify or order processes on the basis of their behavior. Classic examples are 
strong bisimulation j 1 51 1 I ] denoted by ~ and simulation denoted by 

For two given processes P = {{S, Act,^p), sq) and Q = {{T, Act,^Q),to) 
both bisimulation ~ and simulation A are first defined as relations between the 
state sets S and T. These definitions are then lifted to the processes themselves. 
As relations between S and T they can be characterized as the greatest fixpoints 
and of certain monotonic functionals FL and F^. These functionals 
operate on the complete lattice of relations R C S x T ordered by set inclusion 
and are defined by 

{(s,t)| ya,s' : s^p s' ^3t' :t-^Qt' A{s',t') G R 
A Vo, t' : t -^Q t' ^ 3s' : s -^p s' A {.s' ,t') G R} 

and F^{R) {{s,t) \ Vo, s' : s s' ^ 3t' : t -^g t' A {s',t') G R}. The 

processes P and Q are called bisimilar if sq ~ to- Similarly, Q is said to simulate 
P ii So A to- By abuse of notation we denote these relationships by P ~ Q and 
P A Q and view ^ and A also as relations between processes. 
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Fig. 1. A context-free process 



Context-Free Processes. Context-free processes, also called BPA (basic process 
algebra) processes ^ certain type of finitely generated infinite-state pro- 

cesses. Their name derives from the fact that they are induced by leftmost 
derivations of context-free grammars in Greibach normal form, where the termi- 
nal symbols are interpreted as actions and the non-terminals induce the state. 
Greibach normal form means that all rules have the form A ::= aa, where A is a 
non-terminal symbol, a a terminal symbol, and a a string of non-terminals. For- 
mally, context-free processes can be defined as an instance of Rewrite Transition 
Systems as introduced by Caucal 0 . 

A context-free process rewrite system (over Act) is a triple R = (V, Act, A) 
consisting of a finite set V of process variables, the assumed finite set Act of 
actions, and a finite set A C V x Act x V* of rules. The labeled transition 
system induced by i? = (V, Act, A), called a context-free transition system, is 
Tr = {V* , Act, — *■), where ^ C F* x A x F* is the smallest relation obeying the 
prefix rewrite rule 



PRE 



(A, a, a) € A 

AI3 A, a(3 



Note that the states in a context-free transition system are words of process 
variables of the underlying context-free process rewrite system. A context-free 
process is a pair {Tr, Oq), consisting of a context-free transition system Tr and 
an initial state oq G V* ■ As an example we picture in Fig. d the context-free 
process {Tr,A) where R = {V, Act, A), V = {A,B}, Act = {a,b,c}, and A — 
{{A,a,AB), (A,c,e), (B,b,e)}. 

The following two results are crucial for the remainder of this paper: firstly, 
there are context-free processes that are not bisimilar to any finite-state process 
(the process in Fig. Q is an example) and, secondly, simulation between context- 
free processes is undecidable p]. The reader interested in learning more about 
context-free processes and other classes of Rewrite Transition Systems is pointed 
to the surveys and | 2 | and the many references there. 



3 The Logic FLC 

In the remainder of this paper the letter X ranges over an infinite set Var of 
variables, a over the assumed finite action set Act and p over an assumed finite 
set Prop of atomic propositions. We assume that Prop contains the propositions 
true and false. 
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The modal mu-calculus 0 is a small, yet expressive process logic that has 
been used as underlying logic in a number of model checkers. Modal mu-calculus 
formulae in positive normal form are constructed according to the grammar 

(j)::=p\ [a](^ | {a)4) \ (j)i ^ (j )2 \ (1^2 \ X \ pX . cj) \ vX .(f) . 

In the modal mu-calculus the modal operators [a] and (a) do not have the 
status of formulae but can only be used in combination with already constructed 
formulae (j) to form composed formulae [a\4> and {a)(f>. We now define FLC {Fix- 
point Logic with Chop), an extension of the modal mu-calculus that gives the 
modal operators the status of formulae. More importantly, FLC provides a chop 
operator ;, which intuitively represents sequential composition of behavior, and 
a termination formula term, which intuitively requires the behavior of the se- 
quential successor formula. 

We consider again formulae in positive form which are now constructed ac- 
cording to the following grammar: 

4> ::= p \ [a] \ (a) \ (j)i A 4>2 \ (t>i 4>2 \ X \ fj,X . 4> \ vX . 4> \ term \ (fi ; 4>2 ■ 

As in the modal mu-calculus, the two fixpoint operators pX and vX bind the 
respective variable X and we will apply the usual terminology of free and bound 
variables in a formula, closed formula etc. Moreover, we write for a finite set M 
of formulae /\ M and \J M for the conjunction and disjunction of the formulae 
in M. As usual, we agree that A ® V 0 = false. 

Both the modal mu-calculus as well as FLC are basically interpreted over a 
given labeled transition system T = {S,Act,—^). Furthermore, an interpretation 
I € {Prop 2^) is assumed, which assigns to each atomic proposition the set 
of states for which it is valid. We assume that interpretations always interpret 
true and false in the standard way, i.e. such that /(true) = S and /(false) = 0. 

In the modal mu-calculus the meaning of a closed formula essentially is a 
subset of the state set S, i.e. a predicate on states. In order to explain the meaning 
of the new types of formulae, we interpret FLC-formulae by monotonic predicate 
transformers. A (monotonic) predicate transformer is simply a mapping / : 2'^ — > 
2^ which is monotonic w.r.t. the inclusion ordering on 2^ . It follows from well- 
known results of lattice theory that the set of monotonic predicate transformers, 
which we denote by M Trans t, together with the pointwise extension C of the 
inclusion ordering on 2'^ defined by 

f C f iff f{x) C f'{x) for all a; C S' 

is a complete lattice. We denote the join and meet operations by U and □. 

It is customary to refer to environments, in order to explain the meaning of 
open formulas. In the modal mu-calculus, environments are partial mappings of 

type p : Var 2®; they interpret (at least) the free variables of the formula 
in question by a set of states. In FLC we interpret free variables by predicate 
transformers. Thus, we use environments of type S : Var M Trans t- The 

predicate transformer assigned to an FLC-formula (f>, denoted by C!p{(f>){5), is 
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C!r{p)mx) = I(P) 

Cr([a])(5)(x) = {s | 'is' : s s' => s' £ x} 
CT{{a)){S){x) = {s I 3s' : s s' A s' G x} 
Ct(</)i a ( j > 2 ){ 5 ){ x ) = Ct(0i)(5)(x) nC|.((^2)(i5)(x) 
Ct(</)i V ( j ) 2 ){ 5 ){ x ) = Ct(0i)(5)(x) UCt((/)2)(i5)(x) 
4(X)(5) = 5(X) 

C|.(aiX . = n{f £ MTransT \ dr{<i>){5[X 

Ct{vX . <j>){5) = U{/ e MTransT \ Cr(<^)(5[X 
Ct (term) (5) (x) = x 
CU4>1 ; <t>2)(S) = c^(0i)(5) o cUh)(S) 



/]) c /} 
/]) ^ /} 



Fig. 2. Semantics of FLC 



inductively defined in Fig. El The similar definition of the predicate A4^(4>)(p) 
assigned to a modal mu-calculus formulae (j) is omitted due to lack of space. It 
can be found in many papers on the modal mu-calculus. 

Note that the fixpoint formulae of FLC are interpreted by the corresponding 
fixpoints in the set of predicate transformers and not in the set of predicates 
as in the modal mu-calculus. Also note that the chop operator is interpreted by 
functional composition and that term denotes the identity predicate transformer. 
Thus, term is the neutral element of ;. 

As the meaning of a closed formula (f> does not depend on the environment, 
we sometimes write just {Mip{(f))) for C!p{(j)){5) where S (p) 

is an arbitrary environment. We also omit the indices T and / if they are clear 
from the context. 

The set of states satisfying a given closed formula 4> is C{(j)){S). A process 
P = (T, So) is said to satisfy (f> if its initial state sq satisfies (f. It might appear 
somewhat arbitrary that the predicate transformer C{4>) : 2'^ — > 2'^ is applied 
to the full state set S in the definition of satisfaction. As far as expressiveness 
is concerned, however, the choice of a specific set x to which C{(f>) is applied is 
largely arbitrary, as long as x can be described by a closed FLC formula cfx'. 
assume x = C{4>x){S); then C{4>){x) equals C{(j) ; (j)x){S). As Lemma Q] below 
shows, sets X expressible in this way include at least all state sets that can be 
described by a modal mu-calculus formula (i.e. all modal mu-calculus definable 

def 

properties). The formula (poL = AaGAcJ®] i fsise, for instance, characterizes 
the set of deadlocked states. 

Any modal mu-calculus formula (p can straightforwardly be translated to 
FLC: just replace all sub-formulas of the form [a]ip or (a)'0 by [a] ; ip or (a) ; ip, 
respectively. We call the resulting FLC-formula T{(p). A rather straightforward 
structural induction shows that the interpretation of 'P{(p) is just the constant 
predicate transformer mapping any state set to the interpretation of the original 
modal mu-calculus formula. 
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Lemma 1. Let (j) be a modal mu-calculus formula and p : Var 2'^ a modal 
mu-calculus environment. Let 5 he the environment defined by dom 6 — dom p 
and 6{X) = Xy . p(X) for X £ dom p. (Note that 6 assigns constant predicate 
transformers to the variables.) Then C{T{(f)){5) = Xy . M{<f>){p). 

As a consequence, FLC is at least as expressive as the modal mu-calculus. 

Corollary 1. Suppose (j) is a closed modal mu-calculus formula and P is a pro- 
cess. Then P satisfies 4> (in the sense of the mu-calculus) iff P satisfies T{(p) 
(in the sense of FLC). 

Equation systems. A (closed) equation systems of FLC-formula is a set E = 
{Xi = 4>i \ 1 < i < n) consisting of n > 0 equations Xi = (pi, where Xi, . . . , X„ 
are mutually distinct variables and (pi, . . . ,(pn are FLC-formulae having at most 
X\, . . . , Xn as free variables. An environment 5 : {X\, . . . , Xn} —>■ M Trans is 
a solution of equation system E, if 6{Xi) = C{pi){S) for i = l,...,n. By the 
Knaster-Tarski fixpoint theorem every equation system has a largest solution as 
the corresponding functional on environments is easily seen to be monotonic. 
We denote the largest solution of E by vE. 

While it proves convenient to refer to equation systems, they do not in- 
crease the expressive power. Any predicate transformer that can be obtained as 
a component of the largest solution of an equation system E can just as well be 
characterized by a single formula. In order to show this, Gaufi elimination HD! 
can be applied to the equation system (see e.g. m)- 

Proposition 1. Let E be a closed equation system and X a variable bound in 
E. Then there is a closed ELC-formula p such thatC{p) = {vE){X). 

4 Characteristic Formulae for Context-Free Processes 

The goal of this section is to show that any context-free process can be char- 
acterized up to bisimulation or simulation by an FLC-formulaQ As a stepping 
stone, we construct equation systems that capture the contribution of the sin- 
gle process variables to bisimulation and simulation. Characteristic formulae for 
various other (bi-)simulation-like relations, in particular the weak versions, can 
be constructed along this line too. 

In the following, we assume given a context-free process rewrite system R = 
(y, Act, A) and agree on the following variable conventions: the letters A and 
B, Bi, i? 2 , . . . range over V, a ranges over Act, and a and /3 range over V* . For 
notational convenience we use the process variables A & V also as variables of 
the logic. 

We consider the three equation systems E.^ = (A = \ A G V}, E^ = 

{A = pAA I A G V}, and Ey = {A = p^A \ A G V}. Analogously to the 

^ For the simulation case we shall actually construct two formulae. One of them char- 
acterizes the set of processes that are simulated by the process in question and the 
other the set of processes that simulate it. 
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finite-state case |lt)ll4| . the formulae (pr^A, and p'^A mirror the conditions 
in the definition of bisimulation and simulation and are defined by 



4>~a 


def 


(pyA A (p^A , 




4><a 


def 


A [a] ; V • 


■ ; Bi , and 






a^Act 




(pyA 


def 


A A (a) ; ; • 


. . ; 33, . 






ciG:Act (^A,a,,Bx - ■ ■ Bi)^ A 





Now, suppose given an arbitrary transition system T = (S,Act,—f) and an 
arbitrary interpretation I : Prop — > 2‘®. (The specific interpretation does not 
matter as only the atomic propositions true and false appear in the characteristic 
equation systems.) Let <5.^ = vE^ : V {2^ — > 2'^) be the largest solution of 
on T. The following lemma intuitively shows that the A-component of this 
solution represents the contribution of the process variable A to bisimulation. 

Lemma 2. i5,.(A)({s G S' | s ~ /?}) = {s & S \ s Ap} for all A & V , P &V* . 

The ‘A ’-direction can be proved by a fixpoint induction for 5^ and the ‘C’- 
direction by a fixpoint induction for ~ = vF^ . Combined with Proposition ^ 
Lemma 121 shows that there is a closed formula (p^A for each A G P such that for 
all (3 G P*: 

Ct{,p^a){.{s G S I s ~ /?}) = {s G S I s ~ A/3} . (1) 

These formulae pr^A can now be used to construct characteristic formulae for 
context-free processes with underlying process rewrite system R. 

Theorem 1 (Characteristic formulae). For each context-free process P there 
is a (closed) FLC-formula tpr^p such that, for any process Q, Q satisfies tpr^p iff 

Q^P. 

Proof. Let P = (Tp, Bi ■ ■ ■ Bi) and let be tbe formula pr^Pi ; ■ • ■ ; P~Bi ; 
<pDL, where (pph is the formula characterizing the set of deadlocked states from 
Section El 

Suppose Q = {{S, Act, ^), So) is an arbitrary process. Clearly, a state s G S 
is bisimilar to the state e in Tp if and only if it satisfies 4>dl- It follows by 
repeated application of (EJ that, for i — 1, . . . , / , a state s G S satisfies p^Pi I 
. . . ; pr.^Bi ; pDL if and only \i s ^ Bi ■ ■ ■ Bi. Thus, Q is bisimilar to P if and only 
if it satisfies C 

An analogue of Lemma 0 for E^ and Ey ensures the existence of closed 
formulae pp.A and pyA such that Ct{p^a){{s G S' | s ^ /3j) = {s G S | s ^ A/3} 
and CT{pr.^A){{s G S | s ^ /3}) = {s G S | s ^ A/3} for arbitrary A and /3. 
These formulae are used to establish the final theorem of this section. 

Theorem 2 (Characteristic formulae for simulation). For each context- 
free process P there are (closed) FLC-formulae tp^p and ipyp such that, for any 
process Q, Q satisfies ip^p iff Q di P, and Q satisfies ipyp iff Q P P- 
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Proof. Let P = (T/j, Bi ■ ■ ■ Bi) and Q = {{S, Act, ^), sq)- 

A state s G S' is simulated by e if and only if it satisfies <j)DL- Thus, can 
be chosen as the formulae ; • ■ • ; \ 4>dl- 

On the other hand, every state s G S simulates e. In other words, a state 
simulates e if and only if it satisfies the formulae true. Thus, ippp can be chosen 
as the formulae true0 □ 

5 Decidability and Expressiveness Issues 

Clearly, FLC is decidable for finite-state processes: given a finite-state process 
P = (T, So), an interpretation /, and an FLC-formula (p, C^{(p) can effectively be 
computed inductively over (p. The usual approximation of fixpoints terminates 
as M Trans T is finite. 

Theorem 3. FLC is decidable for finite-state processes. 

However, FLC is not decidable for context-free processes. This is a conse- 
quence of the existence of characteristic formulae for simulation. A decision pro- 
cedure for FLC could namely be used to decide simulation between context-free 
processes, which is - as mentioned in SectionO- undecidable: given two context- 
free processes P and Q one would just have to check, whether Q satisfies tpyp 
in order to decide, whether P <Q. 

Theorem 4. FLC is undecidable for context-free processes. 

There is an interesting duality between the decidability of FLC for finite- 
state processes and the decidability of the modal mu-calculus for context-free 
(and even push-down) processes 0. Both scenarios relate an inherently ‘regular’ 
structure with a structure of at least ‘context-free strength’. While the former 
is concerned with the at least ‘context-free’ logic FLC and ‘regular’ finite-state 
processes, the latter relates the ‘regular’ modal mu-calculus (recall that the mu- 
calculus can be translated to monadic second order logic, which closely corre- 
sponds to finite automata) with context-free processes. 

The existence of characteristic formulae for simulation also implies that sat- 
isfiability (and hence validity) of FLC is undecidable: assume given two context- 
free processes P and Q. It is easy to see that Q simulates P if and only if the 
formula ip^p A '4><q is satisfiable. Thus, decidability of satisfiability would again 
imply decidability of simulation between context-free processes. 

Theorem 5. Satisfiability and validity of FLC are undecidable. 

An interesting consequence of the existence of characteristic formulae for 
bisimulation is that FLC does not enjoy the finite-model property!! choose a 

^ The final true could be omitted due to our definition of satisfaction. 

^ A modal logic has the finite-model property, if any satisfiable formula has a finite 
model. 
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0^iJ^2^3^^5^6 
Fig. 3. A linear, finite process 

context-free process P that is not bisimilar to any finite-state process and take 
its characteristic formulae ipr^p. Then this formulae is satisfiable (namely by P 
itself). But it cannot be satisfied by a finite-state process, as this finite-state 
process would then be bisimilar to the context-free process which yields a con- 
tradiction. 

Theorem 6. FLC does not enjoy the finite-model property. 

The modal mu-calculus on the other hand does enjoy the finite-model prop- 
erty Hence, in general, context-free processes cannot have characteristic 
modal mu-calculus formulae: let P again be a context-free process that is not 
bisimilar to a finite-state process and assume there would be a modal mu-calculus 
formula f characterizing P up to bisimulation. Then - by the finite-model prop- 
erty - there would be a finite-state process Q satisfying f. But this would mean 
that P and Q are bisimilar, which contradicts the choice of P. 

As a consequence, FLC is strictly more expressive than the modal mu- 
calculus. However, if this increase of expressiveness would not show through 
on finite-state processes, it would be useless, as far as automatic model checking 
is concerned. 

Fortunately, we can show that FLC already is more expressive on the class of 
finite-state processes, and even on a small subclass, that oi finite linear processes. 
These are processes corresponding to finite words over Act. Formally, the process 
corresponding to a word w = wq ■ • • Wk G Act* is Pw = (({0, . . . , fc}, Act, — >), 0), 
where — >= {(i, Wi, i -I- 1) | 0 < i < k}. As an example, the process corresponding 
to the word ababbc is pictured in Fig. 0 The class of finite linear processes is 
{Pw \ w G Act*} and subclasses of it can straightforwardly be identified with 
sets of words over Act. The modal mu-calculus can be translated to monadic 
second-order logic. Therefore, the class of finite linear models of a modal mu- 
calculus formula (j) corresponds to a regular set of words. The class of finite linear 
models of the FLC-formula {p,X . (term V {a)X{b))) ; fioL, however, corresponds 
to the set {a” 5" | n = 0, 1, . . .}, which is well-known not to be regular 0. 

Theorem 7. FLC is strictly more expressive than the modal mu- calculus, even 
on finite linear processes, and therefore also on finite-state processes. 

It is interesting to note that FLC can even characterize certain non-context- 
free sets of finite linear processes due to the presence of conjunction: let, for 
arbitrary actions a and b, fiat be the formula pX . (term V ((a) , X ; (6))) and fia 
be the formula pX . (term V (a) X). Then the finite linear models of 

{(jab ; (jc ; (joL) A {4>a (jbc I (joh) 

correspond to the set {a”5"c" | n = 0,1,...}, which is context-sensitive but 
not context-free. A more thorough study of the expressiveness of FLC is left for 
future research. 
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6 Conclusion 

We have proposed a modal logic FLC with fixpoints, a chop-operator, and ter- 
mination formulae. The basic idea has been to interpret formulae by predicate 
transformers instead of predicates and to take fixpoint construction over pred- 
icate transformers as well. As a stepping stone in the technical development 
we have shown that FLC allows to characterize context-free processes up to 
bisimulation and simulation. FLC is strictly more expressive than the modal 
mu-calculus but is still decidable for finite-state processes. 

Like the modal mu-calculus, FLC is perhaps not so much suited as a direct 
vehicle for specification. Rather it provides an expressive core logic, into which 
various other logics can be translated. An FLC-based model checking system 
could handle non-regular specification formalisms that are beyond the reach 
of modal mu-calculus based model checkers. An interesting example of such a 
formalism from a practical point of view are the timing diagrams studied by 
K. Fisler in 0. It is a topic of future research whether they can actually be 
embedded into FLC. 

A simple global model checking algorithm for FLC and finite-state processes 
can straightforwardly be constructed from the usual iterative computation of 
fixpoints. This procedure in general has an exponentially larger storage require- 
ment compared to a straightforward global modal mu-calculus model checker: 
we have to store a mapping 2‘® — > B per state and formula (where B denotes 
the set of the Boolean values true and false) instead of just a single Boolean 
value 0 Also the time complexity of FLC seems to be much higher than that of 
modal mu-calculus as (2‘® ^ 2‘®) has exponentially longer chains than 2 ^ such 
that fixpoint computation can require exponentially more iterations. Thus, at 
first glance model checking FLC seems to be impractical. (We currently know 
that model checking with a fixed formula is at least PSPACE-hard.) 

However, the exponential blow up can be avoided for FLC-formulae corre- 
sponding to modal mu-calculus formulae. The idea is to represent the above 
mentioned mappings of type 2'® ^ B by binary decision diagrams (BDDs). As a 
consequence of Lemmas these mappings are constant for all FLC-formulae cor- 
responding to modal mu-calculus formulae. It is, moreover, easy to see that the 
intermediate functions occurring during fixpoint iteration are constant too and 
correspond to the Boolean values that would be observed in a mu-calculus model 
checking procedure. Therefore, only a linear penalty arises for both space and 
time when model checking FLC-formula corresponding to modal mu-calculus 
formulae because constant BDDs can be represented in constant space. In this 
sense the increased expressiveness of FLC is obtained for free: the exponential 
blow-up can only occur in cases that cannot be handled by a modal mu-calculus 
model checker at all! 

The above comparison applies to straight-forward global model checking. If 
and how more elaborate global and local mu-calculus model checking procedures 

^ A collection consisting of one of those mappings for each state in S represents a 
mapping 2® — > 2®, i.e. a predicate transformer, which is the meaning of a formula. 
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can be adapted to FLC remains to be seen. Other topics for future research 
are a more thorough study of the complexity and expressiveness of FLC, in 
particular its relationship to context-free and context-sensitive languages and, 
last not least, the implementation and empirical evaluation of an FLC-based 
model checker. It is, moreover, interesting to study, whether the idea of a ‘second- 
order’ interpretation of formulae by predicate transformers can advantageously 
be applied to other logics. 
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Abstract. This paper presents a completeness result for a first-order 
interval temporal logic, called Neighbourhood Logic (NL) which has two 
neighbourhood modalities. NL can support the specihcation of liveness 
and fairness properties of computing systems as well as formalisation of 
many concepts of real analysis. These two modalities are also adequate 
in the sense that they can derive other important unary and binary 
modalities of interval temporal logic. We prove the completeness result 
for NL by giving a Kripke model semantics and then mapping the Kripke 
models to the interval models for NL. 



1 Introduction 

In many applications, digital systems reacting with environment and events have 
to produce an output before a certain delay has elapsed. Time requirements 
- both qualitative as well as quantitative - have to be considered to reason 
about such systems. Thus, for such purposes one has to consider a real-time 
logic. Various such logics have been proposed. Some of these formalisms interpret 
formulas over intervals of time IbliJibliVliy] : notably among them are Interval 
Temporal Logic (ITL) jTTj and Duration Calculus (DC) |bl I hj . ITL is a first- 
order interval modal logic which uses a binary modal operator which is 
interpreted as the operation of “chopping” an interval into two parts. DC is an 
extension of ITL in the sense that temporal variables are written in the form of 
the integrals of “states” . 

Since chop is a contracting modality, ITL-based logics can succintly 
express properties of the real-time systems, such as; “for all time intervals of 
a given length, (j) must be true” , or “if (j) holds for a time interval, then there 
is a sub-interval where ■i/' holds” . However, these logics cannot express liveness 
properties, which depend on intervals lying outside the reference interval, like; 
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“eventually there is an interval where (j) is true, and “(f> will hold infinitely often 
in the future” . 

Another limitation of these logics is that when they are used in the speci- 
fication of hybrid systems the notions of real analysis such as limit, continuity 
and differentiability cannot be suitably formalised in them. These notions are 
neighbourhood properties of a point which cannot be defined in those logics. 
Although an informal mathematical theory of calculus can be assumed as in 
extended duration calculus PIJ, Hybrid Statecharts m, Hybrid Automata [Q 
and TLA'*' |3, a formalization of real analysis may help in developing theorem 
provers for supporting the design of hybrid systems. 

In order to improve the expressiveness of ITL, expanding modalities have 
been used. Venema HH gives a complete axiomatization of a propositional cal- 
culus with three binary modalities; in addition to chop (designated as C) it has 
modalities T and D, which can represent properties outside the interval. Some 
of the axioms and rules in it are quite complicated. Other expanding modalities 
which are unary have been considered in Halpern and Shoham 0. But many 
notions of real analysis cannot be formalised without first-order quantifiers. 

In ^B|, Zhou and Hansen proposed a first-order interval logic called Neigh- 
bourhood Logic (NL) which has provisions for specifying liveness and fairness 
properties as well as formalising some notions in real analysis. This logic has 
two expanding modalities and called the left and right neighbourhood 
modality respectively. These modalities refer to some past and future intervals 
of time respectively with respect to the original interval of time being observed. 

Although, it is not very hard to see that the Propositional NL is complete 
with respect to Kripke models, it seems to be quite inadequate to derive the 
modalities considered in |51 1 7] (but not conversely). Moreover, Propositional 
NL forms a fragment of the complete logic proposed by Venema in fSl- Nev- 
ertheless, the adequacy of the neighbourhood modalities can be established by 
deriving the other unary and binary modalities of [51 in a first-order logic of 
the neighbourhood modalities and the interval length (cf. [E]). Thus first-order 
Neighbourhood Logic seems to have more expressive power than those of IHU7I 
with a minimum number of modalities. 

This paper presents the syntax and semantics of first-order Neighbourhood 
Logic and then establishes a completeness result. First, we establish a complete- 
ness theorem for the NL formulas in the Kripke model (or possible world model) . 
Then we map the Kripke model to the interval model and prove the complete- 
ness of NL in the interval model. Dutertre 0 has proved a similar completeness 
result for ITL with chop modality. Both follow the approach suggested by |iSI I Yj . 

2 Neighbourhood Logic (NL) 

2.1 Syntax of NL 

A language C for NL consists of an infinite collection of global variables, V A {x, y, 
z, . . .} and also an infinite collection of temporal variables, T = {^,Vl,V 2 , ■ ■ ■}, 
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where £ is a special symbol which will denote the length of an interval. 
will depict the “natural” properties of length in the axioms of the logic to be 
introduced later. In addition, the language contains an infinite set of global func- 
tion symbols F and global predicate symbols H. These symbols are called global 
because their meaning will be independent of time. With each of the function 
and predicate symbols is associated an arity n > 0. Function symbols of arity 
0 will be called eonstants. Predicate symbols of arity 0 are propositions which 
include two Boolean symbols true and false. F includes the symbols +, — and 
H includes =, > etc. There is also an infinite set of temporal propositional let- 
ters P = {X, y,...} which will be interpreted as Boolean-valued functions on 
intervals. The vocabulary also consists of propositional connectives ^ and V, 
the existential quantifier 3 and the left neighbourhood modality <>i and the right 
neighbourhood modality Or- The other usual connectives as well as the 

universal quantifier V are introduced as abbreviations. 

The terms denoted as 9,9i, are defined by the following abstract syntax: 

9 ::= x\i\v\ /" (0i, . . . , 0„); x S V, vGT, / S F. 

The formulas, denoted as (f>, ip, are defined by the following abstract syntax: 

(P ::= X \ \^(P\(PWiP \ [3x)(P \ OpP \ Or^- x G V, X G 

P, GgH. 

A term is global or rigid, if it does not contain any temporal variables. A 
formula is global or rigid, if it does not contain any temporal variables, any 
temporal propositional letters, or any neighbourhood modalities. 



2.2 Semantics of NL 

We fix our domain to be a non-empty set ID (containing the constant symbol 0) 
which will be the underlying representation of time as well as lengths of intervals. 
Traditional semantics, however, distinguishes between temporal domain T (which 
is generally a totally ordered set) and duration domain ID which represents 
durations of time intervals. The duration domain satisfies certain constraints 
since their elements are supposed to measure “lengths” of time intervals(cf.^). 
Here, for simplicity, we take the time domain to be the same as the duration 
domain. As in we want E) to have certain properties which are specified by 
the following axioms. 

D 1 Axioms for =: 

The standard axioms for = are assumed (c/. nni). 

D 2 Axiom for -|- : 

1. X 3- 0 = X. 3. X 3- (y 3- z) = (x 3- y) 3- z. 

2. X 3- y = y 3- X. 4. (x 3- y = x 3- z) => y = z. 

D 3 Axioms for > : 

1. 0 > 0. 3. X > y 33- 3z > 0.(x = y 3- z). 

2. x>0Ay>0=^x3-y>0. 4. ->(x > y) 33- (y > x), 

where we write y > x if (y > x) A (y x). 

D 4 Axioms for — : 

1. X — y = z34>x = y3-z. 
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Clearly, IR (the reals), Q (the rationals) and Z, (the integers) are examples of 
domains which satisfy the above axioms. From the above axioms it can be shown 
that (fD, +) is a commutative group with 0 as the additive identity and —y as 
the additive inverse of y. 

The time domain is ID and the set of all intervals I is given by: 

E = {[a, b] : a,b G E) and (6 > a)}, 
where the interval [a, b] is defined as 

[a,b] = {x G E> : b > X > a}. 

The global variables are assigned meaning through a valuation or value assign- 
ment V : ~V ID. Given an interval [a, b] G E the meaning of the temporal 
variables, propositional letters, function and predicate symbols is given by an 
interpretation function X such that, 

1. X(0,[a,6]) = 0, 

2. X(^, [a,b]) = b — a, 

3. xlv,[a,b]) G E); for u £ T, 

4. I{X, [a,b]) G {tt,ff} for X gP, 

5. X{f, [o, b]) = /, for an n-ary function / G F, 

where / : fD" — > iD is any standard interpretation of /, 

6. X{G, [a, 6]) = G, for an n-ary predicate symbol G G H, 

where G : fD” ^ {tt,ff} is any standard interpretation of G. 
Note that “-I-" and " are interpreted as the associated binary operations on 
E. 

Given a valuation v, the terms are interpreted in the usual way by induction on 
the length of terms ca 

We shall call the pair M. ={E,X) an interval model. Let Ai, n, [a, b] \= A denote 
that the formula A is satisfied in the interval [a, b] (also called the reference 
interval) with respect to the model Ai and valuation v. Satisfiability can then 
be defined by induction on the formulas in a standard way HHEI. We only state 
the cases for formulas with modalities Oi and 

1. TW, [a, 5] 1= 0[A iff there exists c, a> c such that [c, a] |= A. 

2. Ad, J/, [a, b] 1= Or A iff there exists d, d>b such that Ai^n, [b, d] ^ A. 

We say that A is valid, written as |= A, iff for any model Ai, any valuation n 
and interval [a, 5], Ai,v, [a,b] |= A. Also A is satisfiable iff for some model Ai, 
valuation u, and some interval [a, b], Ai,ir, [a, b] \= A. 

3 The Proof System for NL 

In the following set of axioms and rules (as well as elsewhere), 0(D) can be 
instantiated by either Oi or Or (O/or □,. respectively). The following abbreviation 
will be adopted. 

^ ^ r if O = O, 

\ Oi, if O = O, 

□ A ^O^ 

□ A O ^ 

O^ A O O 



Completeness of Neighbourhood Logic 525 



Axioms 

A1 Global formulas are not connected to intervals. 

OA A, provided A is a global formula. 

A2 Interval length is non-negative. 

^ > 0 

A3 Neighbourhood can be of arbitrary length, 
a; > 0 0{£ = x) 

A4 Neighbourhood modalities can be distributed over disjunction and existen- 
tial quantifier. 

0{AV B) ^OA V OB 
03x.A 3a;. OA 

(The second part of A4 implies that the analogue of Barcan Formula is true.) 
A5 A left (right) neighbourhood coincides with any other left (right) neighbour- 
hood provided they have the same length. In other words, neighbourhood is 
determined by its length. 

0{{l=x)^A) ^ n{{£ = x)^A) 

A6 Left (right) neighbourhoods of an interval always start at the same point. 
O O A ==> □ O A 

A7 Left (right) neighbourhood of the ending (beginning) point of an interval is 
the interval itself, if it has the same length as the interval. 

{£ = x) (A 0‘^{{£ = x) A A)) 

A8 Two consecutive left (right) expansions can be replaced by a single left 
(right) expansion, if the third expansion has a length of the sum of the first 
two. 

((a; > 0)A(y > 0)) ^ (0((^ = a;)AO((^ = y)AOA)) 4A 0((£ = x+y)AOA)) 

Rule schemas 

M (Monotonicity) If ^ i/; then 0</> Oi/). 

N (Necessity) If (j) then 

MP (Modus Ponens) If <j) and (j) ^ tp then ip. 

G (Generalization) If <p then {\/x)(p. 

The proof system also contains axioms Dl— D4 and axioms of propositional 
logic and first-order predicate logic. They can be taken as any complete system 
for first-order logic except for some restrictions on the instantiation of quantified 
formulas. A term 9 is called free for x in (p \i x does not occur freely in (p within 
a scope of 3y or Vy, where y is any variable occurring in 9. We also adopt the 
following axioms: 

yx.(p{x) (p{9) / j£ either 9 is free for x in cp{x) and 9 is rigid \ 

^ ^x.(p(x) y or 0 is free for x in (p{x) and (p{x) is modality free.y 

A proof oi an NL formula A is a finite sequence of NL formulas Ai , . . . , A„, where 
A„ is A, and each Ai is either an instance of one of the axiom schemas mentioned 
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above or obtained by applying one of the inference rules, also mentioned above, 
to the previous members of the sequence. We write h A to mean that there exists 
a proof of A in NL and we say that A is a theorem in NL (or A is provable in 
NL). 

The following is easy to check by induction on the length of proof. 
Theorem 1 ((Soundness in NL)). An NL formula which can be proved in 
the calculus must be valid (in any interval model). 

4 Kripke Completeness 

Kripke Model A Kripke model K. for NL is a quintuple {W,Ri,Rr,lD,X), 
where 

— W is a non-empty set of possible worlds, 

— Ri and Rr are binary relations on W, called accessibility relations, 

— ID is a, non-empty set, called the domain, 

— X is an interpretation function which assigns to each symbol s and world w, 
an interpretation I{s,w) satisfying the following, 

1. If s is an n-ary function symbol, then I{s,w) is a function ID" ^ ID. 

2. If s is an n-ary predicate symbol, then I(s, w) is a function ID" ^ {tt,ff}. 

3. If s is a constant or a temporal variable, then X(s,w) € ID. 

4. If s is a temporal propositional letter, then X{s,w) € {tt,ff}. 

5. If s is a global symbol, then X(s, wi) = X(s, W 2 ), for all worlds W\,W 2 G 
W i.e., its interpretation is the same in all worlds. 

Semantics Given a Kripke model 1C, each term t is assigned a meaning on 
ID in each world of W . Given an interpretation X, a valuation n and a world w; 
the semantics of a term is defined by induction | 2 | on its length in a standard 
way and is written as X,j(t,w). For a rigid term, the interpretation of the term 
is the same in all worlds. 

Now we describe the semantics of the formulas. We shall write JC,v,w \= A 
to denote that a formula A is satisfied in the world w under the Kripke model 
K, and valuation v. It can be defined by induction on formulas in a standard 
way with Ri and Rr playing the role similar to binary accessibility relation in 
ordinary modal logic |E|. We illustrate the cases for modal operators. 

1. JC,v,w 1= <>iA iff there exists w' gW such that Ri{w, w') and /C, v,w' \= A 

2. JC,v,w 1= Or A iff there exists w' G W such that Rr{w, w') and 1C, v,w' \= A 

We say that 1C satisfies a formula A (or A has a Kripke model IC), if there are 
a world w and a valuation v such that lC,v,w ^ A. An NL formula A is valid 
in a Kripke model K. if for any valuation v and world w, lC,v,w ^ A. An NL 
formula A is valid if A is valid in every Kripke model. 

A set r of sentences is consistent |S| if there does not exist any finite subset 
{Ai, . . . ,An} of r such that h ~^{Ai A ... A A„). If, in addition, there does 
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not exist any consistent set F' such that F' D F, then F is called a maximal 
consistent set {mcs). 

Let IB he a. countably infinite set of symbols not occurring in the language 
C. Let £+ be the language obtained by adding to C all the symbols in S3 as 
rigid constants. Denote the extended proof system by NL+. 

A set F of sentences is said to have witnesses in IB if for every sentence in F 
of the form 3x.(j){x)^ there exists a constant b G IB such that 4>{b) is in F. 

Let Q be a sentence not provable in NL. Suppose F = {^Q}- It is easy to show 
that F is consistent. Enumerating the sentences of C'^ and adding appropriate 
sentences to F in stages, one can obtain a mcs F* D F in £+ such that F* has 
a witness in IB (cf. ^). Let S be the set of rigid formulas of F* . 

We shall now construct the desired Kripke model JCr = {W, Ri, Rr, ID,1) . 
Let 

W = {A : A is a mcs with witnesses in IB and A A S}. 

W is non-empty since F* G W. The accessibility relations Ri, Rr are defined 
as follows. 

Ri{Ai, A 2 ) O 1 A 2 C Ai and A 2 ) C Ai. 

The domain ID is defined as follows. In IB define a relation = by 
0 = 6 iff a = b G S. 

The axioms D 1 for equality show that = is an equivalence relation on IB. 
Let 

ID = {[6] :bGB} 

be the set of equivalence classes, where [o] denotes the equivalence class contain- 
ing a. 

The interpretation function X is defined as follows. 

1. If z) is a temporal variable, then X{v, A) = \a\ iff v = a G A 

2. If o is a constant, then X(o, A) = [c] iff a = c G A 

3. If / is an n-ary function symbol, then 

X(/,A)([6i],...,[6„]) = [c] iff /(6 i,...,6„) = cG A. 

4. If G is an n-ary predicate symbol, then 

X(G,A)([6i],...,[6„]) = ft iff G(6i,...,6„) e A 

5. If A is a propositional letter, then X{X, A) = ft iff X G A. 

Lemma 1 ((Truth Lemma)). For any formula A{x \, . . . , Xn), where the free 
variables in A are among Si, . . . , Xn, for any world A G W and valuation v, 
JCr, v,A\= A{xi, ... ,Xn) iff A{bi, . . . ,bn) G A, where v{xi) = [6i]; 1 < i < n. 

Since G X*; by Lemma IH JC^^ v, F* |= for any valuation v. Moreover, if 
a sentence A is a theorem of NL then it is in F* and so /Cp, iz, X* \= A. Actually 
it is not required to use all the axioms in the proof of Kripke completeness |2| . 



5 Completeness in Interval Models 

We now translate the Kripke world to the interval models and prove a complete- 
ness result in the interval models. 
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Consider the Kripke Model JCr = {W, Ri, Rr, ID,X) such that JCr,v,r* \= 
^Q, as build earlier. From this Kripke model through a sequence of steps, we 
shall construct an interval model Af = {ID*, I*) and an interval [a, b] such that 
Ai, V, [a, b] ^ ^Q, for any valuation v. 

Define E)* = i?. It is quite straightforward to check that ID* satisfies all 
the axioms D 1 - D 4. Let Z\o G W such that £ = 0 G Aq and 0;Z\o C D*. 
Such a Z\o exists (See PI). Recall that ID is a set of equivalence classes (of rigid 
constants added to C). From now on we shall not distinguish between a and 
the equivalence class [a] containing a. Given an interval [a, b],a,b G ID, we shall 
construct a world A\^a,b] as described below. 

Construction of Z\[a,6] We think of the world Z\o as representing 0. We consider 
the following cases. 

Case 1 a > 0. 

Let Z\i be a world in W such that £ = a G Ai and Or(Z\i) C Aq. Then L\[a,b] 
is a world such that {£ = b — a) G A[a,b] and Or(/i[a,b]) C Ai. 

The existence and uniqueness of such worlds can be established |2|. (Think of 
Z\i as representing the interval [0, o] which is to the right of 0 represented by 
Z\o. Then L\[a,b] represents the interval of length (6 — a) to the right of Z\i; see 
Figured) 

Case 2 a < 0. 

Let Ai,A-j, G W such that £ = —a G A 2 and Oi{A 2 ) C Aq. Also, £ = 0 G 
A3 and OiiAs) C A2. 

Then, A\^a,b] is a world such that {£ = b — a) G A^^^b] and Or(A[a_{,]) C A3. 
Such a world A[(j,b] can be uniquely found |2]. (Think of A2 as representing the 
interval [— a, 0] which is to the left of 0 represented by Aq. Also A3 represents 
the point interval [—a, —a]. Then A^a.b] represents the interval of length (6 — a) 
to the right of A3; see Figured- 

Now, define the function T* as, X*(s, [a, 6]) = il(s, A[a,b]), for any symbol s. 
From the definition of I it follows that T*{£, [a, b]) = x(l, Aja^^]) = b — a. 

We now need the following lemma which can be proved by taking induction 
on formulas (c/. PJ). 

Lemma 2. For any interval [a,b], valuation v and formula A 
M,v,[a,b]\=A iff JC,v, A[a,b] \= A. 

We have K.,v,F* |= ~^Q. Now it can be shown that (see |2|) for some c > 
0, r* = A[o_c], for any valuation v. Thus we have A[q ,,] |= for all 
valuation v. Hence by Lemma |3 A4,i',[0,c] \= ^Q, for any valuation v. Thus 
Q is not valid. Hence we have. 

Theorem 2 ((Completeness of NL)). If a sentence Q is valid in interval 
models, then Q is provable in NL. 

6 Discussion 

A complete axiomatic system for a first-order interval logic with two neighbour- 
hood modalities has been presented in this paper. Barua and Zhou 0 have 
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Case 2: a < 0 



Fig. 1. Construction of 

extended NL by introducing two more modalities in the upward and downward 
directions and have proposed a two-dimensional neighbourhood logic NL'^. They 
have proved a completeness result in NL^ using the same construction. Their 
work suggests that it may be possible to obtain a proof system of Neighbourhood 
Logic in any dimension using the same technique. The logic of NL^ can be used 
to specify the behaviour of the real-time systems where timeless computation is 
taken into account PJ. 

In NL has been extended to obtain a Duration Calculus (DC) where 
temporal variables are expressed in the form of integrals (durations) of state 
variables. It is interesting to note that the proof system of DC is relatively 
complete, i.e. it is complete provided all valid NL formulas (with time domain 
and valuation domain taken to be reals) are considered as axioms in the proof 
system of DC (cf. ^5)- 

Applications of NL (and NL^) are being investigated. In m NL is combined 
with a linear temporal logic to give a real-time semantics for an OCCAM-like 
language, where timeless computation was assumed. Further NL is applied for 
Interval Algebra in the area of Artificial Intelligence m 
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Eliminating Recursion in the /r-Calculus* 



Martin Otto 
RWTH Aachen 



Abstract. Consider the following problem: given a formula of the modal 
/i-calculus, decide whether this formula is equivalently expressible in ba- 
sic modal logic. It is shown that this problem is decidable, in fact in 
deterministic exponential time. The decidability result can be obtained 
through a model theoretic reduction to the monadic second-order theory 
of the complete binary tree, which by Rabin’s classical result is decid- 
able, albeit of non- element ary complexity. An improved analysis based 
on tree automata yields an exponential time decision procedure. 



1 Introduction 

The propositional ^-calculus has, since its introduction in its present form in 
m, emerged as one of the major logical formalism that can deal with interest- 
ing aspects of the dynamic and temporal behaviour of processes or programs. As 
such, it comprises the expressive power of several other well developed logical 
formalism for reasoning about transition systems, among them computation tree 
logic CTL and propositional dynamic logic PDL. Conceptually and model theo- 
retically the /i-calculus is a modal logic. It extends propositional modal logic ML 
by a least fixed point operation, and it shares with basic modal logic the crucial 
semantic property of being invariant under bisimulation. The least fixed point 
construct, which is essentially second-order in nature, adds to modal logic a pow- 
erful, yet tractable form of recursion. It is this aspect of recursion that boosts the 
expressiveness of the /i-calculus in allowing it to express truly dynamic features 
of transition systems that go far beyond the more static and local properties 
expressible in basic modal logic. Liveness, safety or termination conditions are 
typical examples of L^^-definable properties. Given the broad applicability of the 
/t-calculus and its fragments for specification and model checking uses, it is nat- 
ural to consider the issue whether a formalization of some supposedly interesting 
condition on transition systems requires the use of the recursive features of the /t- 
calculus in an essential way. It can be that although some given L^-specification 
syntactically involves /i-constructs, it is logically equivalent to a much simpler, 
static and local assertion in basic modal logic. 

* This is an extension of the original submission; the Exptime result, based on an 
automata theoretic analysis, is new here. Moreover, the original model theoretic 
approach has been simplified. I am very grateful to Moshe Vardi for having, with his 
comments on an earlier version, inspired these improvements. 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 531-^13 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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Similar questions about the possibility to eliminate recursion have of course 
been asked and investigated in other contexts. The so-called boundedness prob- 
lem for Datalog programs, which arises in connection with the issue of database 
query optimization, is a case in point. In the context of classical model theory, 
boundedness was originally introduced and studied by Barwise and Moschovakis 
P. Several results in the first-order context and for the applications to Datalog 
query optimization show the boundedness problem to be undecidable even in 
very restricted settings 17191111 . Probably the strongest known exception con- 
cerns monadic Datalog (or the boundedness of simple monadic fixed points over 
existential first-order formulae without equality or negation), shown to be de- 
cidable in 13. 

In the present paper we propose a proof that the eliminability of recursion 
from arbitrarily nested L^-formulae does indeed constitute a decidable problem. 

Main Theorem The following problem is decidable, in fact even in Exptime.' 
given a formula in the modal pi-calculus, decide whether this formula can equiv- 
alently be expressed in plain modal logic. 

By way of interpreting this result in a somewhat wider context, we recall how 
ML and are characterized as exactly the bisimulation-invariant fragments of 
first-order logic and monadic second-order logic, respectively, see | 2 | and 

Theorem (van Benthem) A first-order formula <p{x) is equivalent to a formula 
o/ML if and only if the class of its models, Mod((p), is closed under bisimulation. 

Theorem (Janin, Walukiewicz) A monadic second-order formula (p{x) is 
equivalent to a formula o/L^ if and only ifMod{ip) is closed under bisimulation. 

In view of these characterizations one may rephrase our main result as 
follows: given bisimulation-invariance, the distinction between first-order and 
true monadic second-order becomes decidable. This provides a nice analogy be- 
tween the bisimulation-invariant scenario and the much more limited scenario 
of word structures. For word structures the corresponding distinction is known 
to be decidable due to the classical results of Biichi, Elgot, Trakhtenbrot and 
Schiitzenberger, McNaughton, Papert, since it coincides with star-freeness of 
regular languages, see EH]- 

Turning back to the related issue of boundedness, we may look at the bound- 
edness problem for modal logic as a special case of our decision problem: given a 
formula <p{X) of modal logic in which X occurs only positively, decide whether 
there is some n G N such that the least fixed point pLx<p{X) associated with 
q}{X) is always reached within n iterations. By a straightforward variation of a 
theorem of Barwise and Moschovakis P, t{x) is bounded if and only if p.xT(X) 
is ML-definable. Thus, our main theorem implies in particular the decidability 
of the boundedness problem for modal formula. This contrasts sharply with the 
undecidability of the boundedness problem for two-variable first-order logic as 
established in EH, and adds to the comparative study of modal versus two- 
variable logics which has emerged in related research, see e.g. |lYI(Sj . 
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Following preparations in Section 2, we shall complete in Section 3 the proof 
of the decidability claim of the main theorem by a reduction to S2S based on 
a bounded branching property for the issue of ML-expressibility. In the final 
Section 4 we then present an alternative automata theoretic analysis which fur- 
thermore yields the Exptime result. 

2 Basic Definitions and Preliminaries 

Tree structures. Since the ^-calculus and modal logic satisfy the tree model 
property, it will throughout suffice to consider tree structures rather than ar- 
bitrary Kripke structures. We shall also restrict attention to the notationally 
simpler case in which only a single binary transition relation (accessibility rela- 
tion) E is present. For us, therefore, a tree structure of type t = {Pi, . . . , P/} 
is a structure 21 = {A, , Pf^, . . . ,P®,0®) where (A,E^,0^) is a tree with 

root 0® and the P® are subsets A. Here (H, E) being a tree with root a means 
that a is the unique element of zero in-degree w.r.t. E and that every element 
is reachable from a on a unique directed P-path (whose length is the height of 
that element). The following classes of tree structures will be important: 

(i) T[r] consisting of all tree structures of type r; 

(ii) %i[t] C T[t] consisting of those tree structures whose branching is bounded 
by n; 

(iii) Tn-m[T] C T[r] consisting of those tree structures whose branching is bound- 
ed by n in all nodes of height less than m. 

For an element a of a tree structure 21 we denote by (a) the elements of the 
subtree rooted at a, by 21 ( (a) that subtree itself. By (a)™ we denote the set of 
elements whose height in 21 ( (a) is at most m, by 21 ( (a)™ the induced subtree 
rooted a. We denote by E^[a] the set of immediate P-successors of a in 21. 

Prunings and end extensions. If 21 C ® and both 21 and ® are tree structures 
we call 21 a pruning of 25 to stress the view that 21 is obtained from 25 through 
cutting away subtrees. Note that 21 C 25 for tree structures implies that A Q B 
is an initial subset in the tree 25, i.e. 0® G H and A is P®-connected. 

We say that 21 is a finite pruning of 25 if there is some n G N such that for 
all a G H whose distance from the root is at least n, p2i[a] = P®[a]. 

25 is an end extension of the tree 21, 21 25, if 21 C 25 and if P® [a] = P® [a] 

for all interior nodes (non-leaves) a of 21. 

Propositional modal logic. We write ML for propositional modal logic. ML[r] 
for r = (Pi,... ,P/} has the following formulae: each Pi is a formula; ML[r] 
is closed under Boolean connectives A,^ (and V, which, however, we regard as 
defined); and if is a formula, then so are Otp (and dually □(/?, which again 
we regard as defined). The semantics is defined over Kripke structures in the 
natural way. (21, a) ^ Pi if a G P®; the Boolean connectives behave as usual; 
and (21, a) \= Oip if there is some a' G P®[a] for which (21, o') ^ (p. If 21 is a 
tree structure of the appropriate type, with root 0®, we simply write %\= tp for 
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(21, 0®) \= ip. I.e. we always regard the root as the distinguished element of a 
tree structure unless otherwise specified. 

The propositional pi-calculus. augments the syntax and semantics of ML by 
means of a monadic least fixed point constructor. A formula p{X) is positive 
in the monadic second-order variable X, \i X only occurs in the scope of an 
even number of negations. In this case <p{X) induces a monotone operator on 
subsets P of T-structures 21 according to P i— > {a G A | (21, a) \= 

Owing to its monotonicity, this operator has a least fixed point [pLxp{X)]'^, 
which may also be obtained as the limit of the monotone sequence of its stages, 
Pq = {a G A I (21, a) |= </5(U/3<a -f’/s)}- h^xp{X) is itself a formula of 
with semantics according to (21, a) \= pLx<p{X) if a G [pLx<p{X)\^ . 

The following observation will be useful in the analysis of L^-formulae. It 
follows from monotonicity considerations. 

Observation 1 If ip{X) is positive in X and if P C A is any stage of pi xp{X) 
over 21, then (21, a) j= p{P) (21, a) \= ip{P \ {a}). 



Relativization. It is useful to associate with classes C of tree structures of type 
T, and with a unary predicate U ^ t , the class of all those tree structures of 
type T U {P} for which the root is an element of the P-part, and for which the 
tree structure of type r induced on the largest initial subset contained in the 
P-part is a member of C. We call this derived class the relativization of C to 
P and denote it C^. It is easy to see that is ML-definable or L^-definable, 
respectively, if C is so definable. In fact the following inductively defined P- 
relativization of L^^-formulae provides the desired formulae: = U A p for 

atomic p; {^p)’^ = P A -^p’^; {pi A P 2 )'^ = Pi A P 2 ', {^pY’ = U A Op^-, 
{pixp{X)Y = P A pixP^{X). Note that the translation p p^ increases the 
length only linearly. 

Bisimulation. Two tree structures 21 and 25 with roots 0® and 0® are bisimu- 
lation equivalent, 21 ~ 25, if there is an P C A x P, such that (0®, 0®) G R and 
for all (c, d) G P: 



c&Pf AAd& P® for all P,-, 



Vc' G EYc] 3d' G P®[d]: (c',d') G P, 
Vd' G P® [d] 3c' G EYc] ■■ (o', d') G P. 



We shall also deal with finite approximations of bisimulation equivalence 
in the form of n-bisimulation equivalence which for tree structures can be 
characterized as follows: 21 25 if and only if 21 ( (0®)" ^ 25 ( (0®)". 

The model theoretic proof of the decidability result in our main theorem 
relies on a restriction of the issue to some subclass Tn-,m of initially n-branching 
trees. Let us say that some model theoretic condition holds in restriction to 
%i\* if this condition is true in restriction to %i\m for some (and hence for all 
sufficiently large) m. 



Lemma 2 For a bisimulation-closed class C C T[r] and for any n the following 
are equivalent: 
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(i) C is yilj- definable in restriction to 

(a) there is some m € N such that for any two tree structures 21 and 21' in 7^;m" 

t/2l r (0'^)'" ~ 21' r then 21 G C 21' € C. 

Sketch of proof (i) ^ (ii) follows directly from the fact that a modal formula of 
quantifier rank m is insensitive to parts of the structure whose distance from the 
distinguished node (the root) is greater than m. For (ii) ^ (i) we observe first 
that, for any given m, there are only finitely many m-bisimulation classes of r- 
trees, each of which is characterized by a single ML-formula of quantifier rank m. 
So C is ML-definable over as a finite union of ML-definable q-bisimulation 
types, provided we can show that (ii) and closure of C under bisimulation together 
imply that C is actually m-bisimulation closed over Let to this end 21, 05 G 

%i-m, 05 21, 21 G C; we show that 05 G C. From 21 05 we may obtain 

an 21' G Tn-m, through duplication of subtrees rooted within (0)"* in 21, with 
the following properties: 21' ~ 21 and 21' 05 via a bisimulation R between 

21' ( (0)"* and 05 |" (0)"* that is the graph of a function / from 21' f (0)"* 
onto 05 f (0)™. Let now 21" be the result of replacing, for all c G at height 
TO, each 21' f (c) by the corresponding 05 f (/(c)). It follows that 21" ~ 05, 
21" ( (0)'" ~ 21' ( (0)™, 21' ~ 21. Now 21 G C by assumption, 21' G C by --closure, 
21" G C by (ii), and therefore finally 05 G C by ~-closure again. □ 

It is easy to see that condition (ii) of the lemma is further equivalent to 
the following condition on expansions of the complete n-ary tree: There is a 
finite initial subset V of the complete n-branching tree T„ such that for all initial 
W CV and all Pi, . . . ,Pi C W : either all end extensions offTn, Pi , . . . , Pi) \ W 
are in C, or none is. Omitting the straightforward application of well-known 
interpretation techniques, we note that this condition is expressible in monadic 
second-order over the complete n-branching tree, provided C itself is monadic 
second-order definable (as is clearly the case for L^-definable classes). But by 
Rabin’s famous theorem the monadic second-order theories of the complete 
n-branching trees are all decidable — actually uniformly in n, since all these 
theories are uniformly interpretable in that of the complete binary tree, S2S. 
Thus we have the following for the restriction of the ML-expressibility issue to 
classes 7),;,. 

Proposition 3 The following decision problem is decidable via reduction to S2S, 
uniformly in n and in the vocabulary of ip: given ip £ and n G N, decide 
whether there is a formula G ML that is equivalent to ip in restriction to Tn-*- 

3 Prunings, Preservation, and Decidability via S2S 

Prunings offer a canonical means to govern the branching degree of tree struc- 
tures. The idea is to associate with each node a set of properties which are 
relevant for its direct if-successors, and to consider those prunings, which - at 
each node a - retain sufficiently many immediate successors so as to still realize 
the same relevant properties in the remaining successors of a. 
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Definition 4 Let F he a, set of classes of tree structures of type r. A pruning 
*8 C 21 is called F- elementary , if for all b G B and for all C G A: if there is 
some a G A® [6] such that 21 f (a) G C, then there also is an a G A® [6] such that 
21 r (a) G C. 



Observation 5 Given F and 21, there is a F -elementary pruning of^, whose 
branching degree is hounded by |A|. 



Definition 6 Let A be a set of classes of tree structures of type r. A class 
Co C T[r] is finitely preserved with respect to F if for all A-elementary finite 
prunings 21 C *8, *8 G Cq iff 21 G Cq. F is a preservation set if each C G F is 
finitely preserved w.r.t. F. 

These notions apply in the context of the well-known small branching prop- 
erty for L^ which plays a role in satisfiability considerations, see e.g. jl 3l5j and 
compare the so-called Fischer-Ladner closure of (p from and others. 

Proposition 7 Any h^-definahle class C = Mod(</3) is a member of some preser- 
vation set Fip of h^- definable classes, whose size is bounded by the length of p. 

Sketch of proof. By induction with respect to the structure of the defining formula 
if G L^. We first argue that Mod(</9) is finitely preserved w.r.t. some consisting 
of Mod((/?) and fewer than \ip\ other L^-definable classes; identifying these classes 
with their defining L^-formulae, we regard F,p as a subset of L^, with ip G F,p. 
The atomic case is obvious with F,p = {(/j}; negation is dealt with by putting 
F^,p = F,pU {^p} (the complement of C is finitely preserved w.r.t. F if C itself 
is). Similarly we may put T'(piAv32 = ^V 2 U {pi A p 2 \- Modal quantification 

is covered in F^,^ = F,p A {Ot/?}. 

Finally, let p = Note that T0(jc) is a set of formulae in a vocabulary 

involving A as a basic proposition. Let F[pxfp/X] be the result of substituting 
px'f’iX) for every occurrence of X in all formulae in F. We claim that T(, = 
[hx'f’/X] U {p} is good for p. We have to show that p itself as well as any 
other member of F,^ is finitely preserved w.r.t. F,^. We first argue for formulae 
other than p, i.e. for x[hxf^/X] for x{X) G U^ider the assumption that 

pxtp{X) itself is preserved, preservation for xihx'4’ / X] is inherited from the 
corresponding preservation property of T,/,: it corresponds to the special case of 
the latter in which X happens to be interpreted as pxfj(X). It remains to show 
that px'f’iX) is finitely preserved w.r.t. F,p. Since we are claiming preservation 
only w.r.t. finite prunings, we may prove the preservation claim by induction over 
subtrees and may consider a pruning in just one single node. Assume that 05 C 21 
is a Tyj-elementary pruning obtained from 21 through deletion of subtrees rooted 
in elements o' G E^[a\. Using the assumption that (®, b) ^ px'f’iX) iff (21, b) \= 
px'f’i.X), for all h G (o)® \ {o}, we find that the pruning (®, [px'f’]^ \ {a}, a) C 
(21, [px'f’]^ \ {a}, a) is /l0(jf)-elementary. This gives the desired preservation of 
p at a, through an application of Observation Q □ 
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Lemma 8 Let C C T[r] be a member of a preservation set F. If C is ML- 
definable in restriction to 7^;» for n = 2 ■ |L|, then C is yiL- definable over F[t\. 



Proof. Let n = 2-\F\ and assume that C is ML-definable in restriction to 
Suppose, towards a contradiction, that C were not ML-definable over T[r]. By 
Lemma|2I we may therefore find tree structures 21 and 21' such that 21 € C, 21' ^ C, 
and 21 r (a)™ ~ 21' f (o')"'. W.l.o.g. assume that 21 f (a)™ = 21' f (o')™, and 
that 21 and 21' are disjoint beyond height m. Let U and V be monadic predicates 
not in T and put t := t U {U,V}. Let F = {C^ \ C G L} U {C^ \ C G F} the set 
of classes obtained by relativizing those in L to 17 or V, respectively. Note that 
|L| = n. Let *8 be the tree structure of type f obtained from 21 U 21' by putting 
[7® := A and M® := A'. Let ®o C 05 be a L-elementary finite pruning of 05 
such that 05 q G Tn-^rnW\ (cf- Observation ini) . Note that in 05 q, U and V still are 
initial subsets. Let 21 q be the tree structure of type r obtained as the restriction 
of 05o to [/®“, 2lg similarly induced by the restriction to L®“. Now, 21 q and 21 q 
are in particular L-elementary finite prunings of 21 and 21', respectively, whence 
2lo £ C and 21 q ^ C. Obviously still 21 q f (a)"* = 21 q (o')™. But this contradicts 

the assumption that C was ML-definable in restriction to Fn\m[T\, by Lemma 0 

□ 



Corollary 9 For ip G Lfj, there is an equivalent formula in ML if and only if ip 
is equivalent to some formula of ML in restriction to 77;* for n = 2\ip\. 

With Proposition El this yields the decidability claim of the main theorem. 

4 Tree Automata and Exponential Time Complexity 

The following is a variant of Lemma El which is in fact easier inasmuch as 
branching degrees are disregarded. Let C C T[r], U ^ t, and the correspond- 
ing relativization. Let AC be the class of those 21 G U {[/}] that have two 
different end extensions 21 05^ such that 25i gC^, *82 . 

The tallness of a tree structure is the minimum over the heights of the leaves. 
A class of (finite) tree structures is of bounded tallness if there is a uniform finite 
bound on the tallness of its members. 

Lemma 10 For any bisimulation-closed C C T[r], and with the induced classes 
and AC as above, the following are equivalent: 

(i) C is ML-definable. 

(a) there is some m G N such that for any two tree structures 21 and 21' in F[t\: 
if 21 \ ~ 21' ( then 21 G C 21' G C. 

(Hi) AC is of bounded tallness. 

For L^-definable C we want to view condition (iii) in an automata theoretic 
setting, having AC accepted by some suitable tree automaton. Firstly, however. 
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we need to present finite tree structures and some information about their end 
extensions in a way that fits automata. 

A A-labelling of a (naked) tree 21 G is a mapping X: A ^ A. Clearly, 

any tree structure in 7)i„[r] may be coded as a A-labelled tree for A = V{t), 
by putting A(a) = {Pt G r | 21, a |= Pi}. A leaf labelling on 21 is a labelling 
defined on the set of leaves of 21 rather than the entire universe. We may regard 
a finite tree structure of type r together with a leaf labelling tt in alphabet S 
as a (naked) finite tree with a labelling in the alphabet A — V{t) x (ALJ {*}), 
where 7r(a) G V{t) x {*} for all interior nodes a. It will therefore suffice to deal 
with the format given by 

%l[A] = {T = (21, A) I 21 G 7(,„[0], A: A ^ A a labelling }. 

A (deterministic leaves-to-root) tree automaton over '7)f„[A] is given as A = 
(Q, 6) where Q is the finite set of states, and 6 a transition function of the form 
S: V{Q) X A ^ Q. Its run on T G 17(f„[A] is described by an induced labelling 
p: A ^ Q defined inductively according to p{a) = i5({p(a') | a' G A®[a]}, A(a)). 
The tree language accepted by A w.r.t. some F C Q is L{A, F) = U^gF 9) 
where 

L(A, g) = {T G I p(0) = q for the run p of A on T }. 

For an application as an acceptor of trees whose branching degree is bounded 
by some n (which is the standard format used in the literature), we may of course 
transcribe <5 into a function 5”: Um<ri Q"* x A — > Q- Let A" = {Q,S^) be this 
specialization of A to trees of n-bounded branching. 

Observation 11 The size of A = {Q,S) is |i5| ^ |A| • Its restriction A” to 
trees of n-bounded branching is of size |5"| ^ |A| • |Q|". 

It is one of the main points in our application, though, that the branching 
should not a priori be bounded. The exponential blow-up between A" for fixed n 
and A itself is the reason that we shall want to deal with a special subspecies of 
the above kind of automata, for which the transition function can be given in a 
more compact format. Assume that A = {Q, i5) where Q is of the form Q C V{r) 
for some finite set P, Q closed under union. We say that A = {Q, S) is ofU-type 
w.r.t. P if, for all q C Q and for all r G A, the value of 6{q,r) only depends on 
y q. The following is then straigtforward. 

Lemma 12 Let A = {Q, 6) be ofU-type w.r.t. P and let |F| = n. Then, for any 
F C Q, L{A,F) is of bounded tallness iff L{AT ,F) is of bounded tallness. 

An inspection of typical results for standard (fixed branching) tree automata 
shows that these carry over to our slightly more general notion of tree automata, 
with the above notion of size. We refer in particular to M. Vardi’s discussion of 
tree automata and their applications in m. and to the handbook article by 
W. Thomas for background. 
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Theorem 13 Bounded tallness of L{A, F) is decidable in time polynomial in 
the size of A. 

For a fixed finite set F C L^[t] let tp'^(2l) = {'(/' sr|2l|='0}, and 
Tp (21, a) = tp-^(2l f (a')). Consider an end extension 05 of a finite 

tree structure 21 of type r, 21 *8. With *8 associate the leaf labelling that 

maps a leaf a of 21 to 7r(a) = Tp'^(Q5,a). If _T is a preservation set, then the 
leaf labelling induced by an end extension *8 of 21 fully determines tp^(*8 |" (a)) 
for all a G A. Indeed, a tree automaton can compute these types from the leaf 
labelling over 21. Note that (21, tt) is a Hr-leaf-labelled tree structure where Sp = 
{Tp^(*8,6) I 05 e T[t\\ C V{F). If F is a set of at most n L^-formulae whose 
length is at most n, then this alphabet itself is recognizable in time exponential 
in n: p G Ep if and only if the L^-formula ^ is satisfiable. 

But L^-satisfiability is in Exptime due to Note that the necessity to make 
this labelling alphabet explicit eventually turns our automata theoretic decision 
procedure into a reduction to L^-satisfiability. We code (21, tt) as a tree T G 
%fJ[Ap], where Ap C V{t) x (V{F) U {*}). Let in this sense [21; stand for the 
T G %fJ^Ap] associated with an end extension 21 05. The following extends 

Proposition 0 The inductive proof, which is an elaboration of that given for 
Proposition 0 above, is omitted here. 

Proposition 14 For every ip G L^[r] there is a preservation set F = F^p of size 
|F^| ^ \ip\ and with ip G and an automaton Ap with state set Qp = V{F), of 
U-type w.r.t. F, such that for all 21 05.' [21; 05]'^ G L{Ap,q) tp'^(05) = q. 

We turn to criterion (iii) from Lemma El Let ip G L^[r] and consider two 
end extensions 05 1 and 052 of 21 G F(i„[r U {F}]. For AC we are interested in the 
case that 05^ \= pA' and 052 A The 05^ induce two leaf labellings on 21 w.r.t. 
F = Fp,u , which we may code into one with a labelling alphabet consisting of 
the product of the original Ep with itself. This turns the triple (21, 05i, 052) into 
a labelled tree 



[21;05 i; 052] G where T* = F(r) x {{Ep x Ep) U {*}). 

Consider now an automaton A which simulates in parallel two copies of the 
automaton Ap of Pronosition I 1 41 one working with the first component of the leaf 
labelling, the other with the second component. The natural way of performing 
this parallel simulation uses a state set Q*p := Qp x Qp, so that A*p operates 
like Ap in both components. Identifying F(F) x F(F) with V{F x {1,2}), and 
writing F* for F x {1,2}, we may regard Ap as of U-type w.r.t. F*. 

Now AC consists of those 21 for which some [21; 25i; *82] is accepted by A* = 
A*p in some state {q,q') where qA G q and <p^ ^ q' . Therefore, the tallness 
problem for A* turns out to settle ML-expressibility of ip, by Lemma E3 



ip is expressible in ML 



L{A*,F) is of bounded tallness, 
where F = {{q,q') \ G q,p^ ^ q'}. 
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Note that the size of A* is doubly exponential in \if\. But by Lemma 
we may equivalently consider the tallness problem for the automaton for 

n=|r*| = 2- |r|, because A* is of U-type. By Observation ITTI and Theorem ESI 
bounded tallness of (A*)'^ is decidable in simply exponential time in |T|, and 
hence in \tp\. This proves the Exptime bound in the main theorem. 

This bound is essentially optimal, since there is a straightforward reduction 
of L^-satisfiability to ML-expressibility: if ^ is not ML-expressible, and if 
U and V are not in ^ or cp, then ip is unsatisfiable if and only if the formula 
<>p^ A is equivalent to a formula in ML (namely to T) . 
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Abstract. A deterministic algorithm O accepting a language L is called 
(polynomially) optimal if for any algorithm A accepting L there is a 
polynomial p such that timeo(3;) < p(|a:|+ timeA(a;)) for every x G L. 
It is shown that an optimal acceptor for a language L exists if there 
is a p-optimal proof system for L. If L is a p-cylinder also the inverse 
implication holds. This result widely generalizes work from Krajicek and 
Pudlak who showed the result for L = TAUT. It is further shown how 
to construct an optimal acceptor for a p-cylinder L, given an acceptor 
for L which runs fast on every easy subset of L. Then we investigate 
the relationship of this notion of an ‘optimal acceptor’ to a more general 
notion of optimality. Here, instead of considering time-complexity on 
each individual string x, worst-case time-bounds are considered. It is 
observed that every set complete for exponential time under linearly 
length-bounded polynomial-time many-one reducibility has an acceptor 
with an optimal time-bound whereas on the other hand no set hard 
for exponential time under polynomial-time many-one reducibility has 
a p-optimal proof system. Finally we show how these results can be 
translated to nondeterministic algorithms and optimal proof systems. 



1 Introduction 

The major aim in the development of algorithms for hard sets is to decrease 
the runtime. A related line of research is to design heuristics which have a good 
performance on important instances or to identify efficiently decidable subsets 
(see, e.g., 0). It seems to be ambitious to ask for an algorithm which has in some 
sense the fastest possible runtime on every input, an algorithm which runs fast on 
easy instances, and in some sense includes all possible heuristics for the problem, 
even those which are not known yet. However, Levin jHj proved that such an 
optimal algorithm exists for the functional task to find witnesses for elements of a 
given set in AfV (cf., TheoremQ]). For example, using the random access machine 
(RAM) model of computation, one can construct an algorithm O which finds 
satisfying assignments for formulas ip € SAT such that for any other algorithm 
A solving the same task, there is a constant c with timeo(a;) < c(timeA(a;) -I- |a;|) 
for every x G SAT (O may not halt on other inputs). One can rephrase Levin’s 
result in terms of inverting polynomial time computable functions as follows 
(again we state the result using the RAM model). 



C. Meinel and S. Tison (Eds.): STACS’99, LNCS 1563, pp. 541-^^21 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



542 



Jochen Messner 



Theorem 1 ([s]). For each (partial) function h computable by a RAM in poly- 
nomial time p there is a RAM M inuerting h such that for euery RAM M' in- 
verting h there is a constant c > 0 with timeMiv) < c • (timeM'(y) + p{\M' {y)\)) 
for every y in the range of h. 

It is noted in m that one can transfer the result to the Turing machine model 
if one replaces the term c • (. . . ) above by c' •((...) + log(. . . )). 

To study the existence of optimal algorithms in a more machine independent 
fashion it is suitable to use the following definition of optimality. Intuitively, for 
some task to solve, let us call an algorithm O optimal for this task on instances 
from S C E* if for any other algorithm A solving the same task there is a 
polynomial p such that timeo(a;) < p(timeA(a;) + |a;|) for every x G S. Using the 
Turing machine model of computation (as we will do in the rest of the paper) 
one can formalize this intuition as follows. 

Definition 1. Let A be a collection of Turing machines, let S C E* . A Turing 
machine M G A is called polynomially time-optimal (short: optimal^ for ^ on S' 
if for any M' G A there is a polynomial p such that timeM{x) < p{timeM'{x) -I- 
|a;|) for each x G S. 

The definition implies that any task which can be solved in polynomial time has 
an optimal algorithm. As a consequence of Theorem Done obtains 

Corollary 1. For each (partial) function f G TV there is a Turing transducer 
which is optimal on the range of f for the deterministic transducers inverting f . 

Contrasting with this functional task, in this paper we primarily investigate the 
existence of an algorithm which is optimal on L for the deterministic algorithms 
accepting the language L. For short, such an algorithm is called an optimal 
acceptor for L. We show that the existence of an optimal acceptor for L is 
closely related to the existence of a p-optimal proof system for L. In 0 Cook 
and Reckhow considered a function h G TV with range L as an (abstract) proof 
system for L. A proof system h for L is called p-optimal if every proof system / 
for L is p-simulated by h which means that there is a function g G TV such that 
h{g{x)) = f{x) for any x in the domain of /. Connections between the existence 
of p-optimal proof systems and other complexity theoretical notions have also 
been studied in [3 EJEl Olj' Krajicek and Pudlak showed in 0 that an 

optimal acceptor for TAUT exists if, and only if, there is a p-optimal proof 
system for TAUT. Recently, using an idea from jS], Sadowski m showed that 
the result holds also for SAT instead of TAUT. A main objective of this paper 
is to generalize the result to further languages L. In fact, using further ideas 
from (^, we prove that for any language L, an optimal acceptor for L exists if 
there is a p-optimal proof system for L. The reverse implication is proved under 
the assumption that L is a p-cylinder. 

Schnorr shows in H2] that for self-reducible sets L, the complexity of the 
functional problem which is the problem to find witnesses for membership in 
L is closely related to the complexity of the decision problem. Using the result 
of Levin he is able to construct ‘optimal’ acceptors for self-reducible problems 
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like, e.g., SAT. However, the notion of optimality used here is a more general 
one than the notion of optimality considered above. Instead of considering the 
time-complexity of the algorithms on each individual string, time-bounds are 
considered. A function t : IN ^ IN is called a time-bound for an algorithm A 
on S C S* if timey!i(a;) < t(n) for every x € S with |a;| < n. Let us call t a 
time-hound for the set L if t is a time-bound on L for a deterministic Turing 
machine accepting L. If, in addition, for any time bound s for L there is a 
polynomial p such that t(n) < p{s{n)), t is called an optimal time-bound for L. 
Rephrasing the result in one can construct an acceptor for SAT with an 
optimal time-bound. Note however that such an acceptor may have exponential 
run-time on every instance from SAT even on those instances which can be solved 
efficiently by some known algorithm. On the other hand, under the assumption 
V — JVV this algorithm has a polynomial time-bound on instances from SAT 
(the algorithm may not halt on other inputs). 

The relation between both notions of optimality is examined in Section 0 
There we show that any deterministic time class DTIME(t(n)) determined by 
a time-constructible function t with V C DTIME(t(n)) contains a set that has 
no optimal acceptor but an optimal time-bound. This implies for example that 
no set <Pj-hard for exponential time has a p-optimal proof system. On the 
other hand it is shown that any set complete for exponential time under linearly 
length-bounded many-one reducibility has an optimal time-bound. In Section 3 
we show a relationship between the performance of acceptors or proof systems 
on easy instances from L, and the existence of optimal acceptors or p-optimal 
proof systems for L. Some implications stated in the main theorem in Section 4 
are already proved there. Finally in Section El we briefly discuss how the results 
can be transferred to nondeterministic algorithms and optimal proof systems. 

Due to the limited space several proofs are shortened or omitted in this 
version of the paper. 

2 Preliminaries 

We assume some familiarity with standard notions of computational complexity 
theory, and refer the reader to books like |2] for notions not defined in this paper. 
Let S be some fixed finite alphabet containing 0 and 1. The output of a Turing 
transducer M on input a: G A* is denoted by M{x); we write M{x) = T if 
M does not accept or runs forever on input x. Similarly for a partial function 
/ : A* ^ A* we write f{x) = A if / is undefined on x; a transducer M 
computes f if f{x) = M{x) for every x G A*. Let TV denote the class of 
partial functions computed by transducers which have a polynomial time bound 
on A*. Let /i be a function with range R C A*; For a function / (and also for 
a transducer M computing /) we say that / (resp. M ) inverts h on S C R if 
^ifiy)) = y ior y G S. If additionally / G TV we say that h is TV -invertible on 
S. Following j2j a function h is called Tinvertible (in polynomial time) if there is 
a function, denoted in TV such that h~^{h{x)) = x for x G A*; h is called 
length-increasing if |^(x)| > |x| for any x; we call h linearly length-bounded if 
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|/i(a;)| < c ■ \x\ for some constant c and any x. Given a subset S of the domain 
of /, f{S) denotes the set {f{x) \ x G S}. A set A C E* many-one reduces to 
B via a total function / G tFV (in symbols: A B) if for all x G E*, x G A 
if, and only if, f{x) G B. If, additionally, / is 1-invertible with range E* , which 
means B A via /“^, A and B are called p-isomorphic. A set A which is 
p-isomorphic to A x E* is called p- cylinder. The following Lemma shows the 
property of p-cylinders that makes the notion useful for this paper. 

Lemma 1 . L is a p-cylinder if, and only if, any set which is <^-reducihle to 
L, is <tf,-reducible to A via a 1-invertible, length increasing function. 

A proof of the lemma is found in [2|. It is further shown there that L is a p- 
cylinder if, and only if, L is 1-invertible paddable (L is called 1-invertible paddable 
if there is a 1-invertible function g G TV such that g{{x,y)) G L if, and only if, 
X G L for all x,y G E*). 

Let £ denote the class DTIME(2^*^"^) and let Af£ be its nondeterministic 
counterpart. 

3 Optimality and Easy Tasks 

In this section we show several constructions that relate the performance of 
acceptors or proof systems on easy instances from L to the existence of optimal 
acceptors or p-optimal proof systems. So it is easy to see that (intuitively stated) 
an optimal acceptor runs fast on easy instances from L (see Proposition Q); simi- 
larly when considering a p-optimal proof system we obtain that proofs for easy 
instances are easy to compute (see Proposition Ej) . More surprising is probably 
that also an inverse version of these statements holds when L is a p-cylinder. So 
given a proof system for L which allows one to compute proofs for easy instances 
easily, one obtains a p-optimal proof system for L (see Proposition EJ. Similarly, 
given an acceptor for L which runs fast on every easy instance one obtains an 
optimal acceptor for L. The proof of the latter statement is the most involved of 
these constructions and is delayed to the proof of the main theorem in the next 
section. 

Proposition 1. Let M be an optimal deterministic acceptor for L, and let S be 
a subset of L with S G V. Then there is a polynomial p such that timeM{x) < 
p(|a;|) for x G S. 

Proposition 2. A p-optimal proof system for L is TV -invertible on any subset 
S of L with S G V. 

Proof. Let h he a p-optimal proof system for L and let S' C L, S' G P. Let g be 
a proof system defined as follows 

( h{v) if w = Ou, 

g{w) = < u if w = lu and v G S, 

I T otherwise 
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As h is p-optimal there is a function / S TV such that h{f{w)) = g{w) for any 
w S E* . Now, let r be a polynomial time computable function with r(x) = /(la:) 
for X G S. One easily checks that h{r{x)) = h{f{lx)) = g{lx) = x for any x G S. 

For any set L let T{L) be defined by 

T(L) = {{AI,x,0“) I X e E*,s >0, M is a det. Turing transducer, 

and timeM(a:) < s implies M{x) G L} , 

and fixing some transducer M let 

T{L)m = {(M,a;,0") G T(L) | a: G A* , s > 0} . 

Notice, if L is the range of the function computed by the transducer M then 
T{L)m is the trivial language {(M, a:, 0®) \ x G E* ,s> 0}. 

In ini it had been observed that T(L) is polynomial time many-one equivalent 
to L for L E* . We state this observation in Lemma 0 It shows that in some 
sense T{L) is the hardest set which is <(),-reducible to L: any set <((,-reducible 
to L is reducible to T(L) via a very simple many-one reduction. 

Lemma 2. For any set L C E* the following holds 

• T{L) <P^LifL^E*. 

• A L via f implies that A is reducible to T{L) via g \ x ^ (M, a;, 
where M is a transducer computing f in polynomial time p. 

It is interesting to note that the Lemmas G] and 0 imply that T{L) is a p-cylinder. 

We will now see that an inverse version of Proposition O holds. In the proof of 
Proposition0we show how to construct a p-optimal proof system for a p-cylinder 
L given a proof system for L which is J^T^-invertible on every easy subset of L. 
Let us first state the following lemma which holds for any language L. 

Lemma 3. Let g be a proof system for T(L) such that for any transducer M 
computing a proof system for L, g is TV -invertible on T{L)m- Then there is a 
p-optimal proof system for L. 

Proof. We will observe that the following algorithm computes a p-optimal proof 
system h. 

input (Ml, M 2 , a:, 0®, 0") 
if timeMi ((-^ 2 , x, 0®)) < n then 

let w be the output of Mi on input (M 2 , x, 0®) 
if g{w) = (M2,a;,0®) and time M 2 (a;) < s then 
output M 2 (a;) and halt; 
otherwise reject. 

Notice that g{w) = (M2,a;,0®) implies (M 2 ,a;, 0®) G T{L), then M 2 {x) G L 
if time M 2 (a;) < s. This shows that the range of h is a subset of L. We now 
show that h p-simulates any proof system / for L. By assumption there is a 
transducer M 2 computing / in polynomial time p and a function r G TV such 
that g(r((M 2 , a;, 0®))) = (M2,a;,0®) for all x and s. Let Mi be a transducer 
computing r in polynomial time q. Now / is p-simulated by h via the translation 
X I— > {Ml, M 2 , X, 0®, 0") where s = _p(|a:|) and n = g(|(M 2 , x, 0®)|). 
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One may also replace the proof system g in the proof above by a suitable 
acceptor for L which yields the following Lemma. 

Lemma 4. Let R be a Turing aeceptor for T{L) whieh has a polynomial time- 
hound on T{L)m for any transdueer M eomputing a proof system for L. Then 
there is a p- optimal proof system for L. 

When L is a, p-cylinder one can replace the set T(L) in Lemma 0 above by 
L itself which yields the Proposition 0 

Proposition 3. Let L he a p-cylinder, and let h be a proof system for L which 
is J-V -invertible on any S € V with SQL. Then there is a p-optimal proof 
system for L. 

Proof. Let / be a 1-invertible reduction from T{L) to L. We will show that a 
function g G TV with g{{x,w)) = a; if f{x) = h(w) is a proof system for T(L) 
which fulfills the conditions of Lemma 0 If a transducer M computes a proof 
system for L we have T{L)m G V. Then S = f{T{L)M) is also in P as / is 
1-invertible. As h is iFP-invertible on S there is a function r G TV such that 
^(^(y)) =y iov y G S. Observe g{{x,r{f{x)))) = x for x G T{L)m- 

Again a similar result holds using an acceptor. 

Proposition 4. Let L be a p-cylinder, and let R be an acceptor for L which has 
a polynomial time-hound on every S G V with SQL. Then a p-optimal proof 
system for L exists. 

Notice that the implications in the PropositionsEland Elcan be generalized 
to further languages L. For the proofs to go through it suffices that there is a 
<^-reduction from T{L) to L such that f{T{L)M) S V for every machine M 
computing a proof system for L. On the other hand the author does not believe 
that any recursively enumerable language which has a maximal subset in V , has 
an optimal acceptor. This would show that the implications in Propositions 0 
and 0 do not hold for every non-p-levelable set (see 0 for definitions). 

4 Optimal Algorithms and Optimal Proof Systems 

We now complete the proof of the main theorem. The theorem generalizes the 
result of Krajicek and Pudlak 0 for TAUT to any p-cylinder. 

Theorem 2. For a p-cylinder L, the following statements are equivalent. 

1. There is a p-optimal proof system for L. 

2. There is a proof system for L which is TV -invertible on any S G V with 
SQL. 

3. There is an optimal acceptor for L. 

J. There is an acceptor for L which has a polynomial time-bound on every 
SGVwUhSQL. 

In addition, the implications 1^2, 3^4, <i'nd 1^3 hold for any language L. 
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The proof for the theorem is obtained by several constructions partially al- 
ready given in the previous section. More detailed, the implications 1 ^ 2, 3^ 

2 ^ 1, 4 1, and 1^3 are proved by the Propositions 12 n El ^ and0 The 

most involved of these constructions is given by the implication 7 => 5 which we 
prove now. 

We need the following Lemma from p| . 

Lemma 5. If A has a p-optimal proof system then any set <tf^-reducihle to A 
has a p-optimal proof system, too. 

In an intuitive sense we can use a p-optimal proof system for L to certificate 
efficiently and uniformly the fact that a machine Mi accepts only strings from 
L. This intuition is reflected in the proof of the following lemma. 

Lemma 6. Let L he a set possessing a p-optimal proof system. Then there is a 
recursive function mapping any transducer Mi to a transducer M[ such that 

L{M'i) = L{Mi) n L 

and, if L {Mi) C L, there is a polynomial p such that for any x £ S* 

timeM'(^) ^ p{AraeMi{x) -\- |x|). 

Proof. It suffices to prove the result for L ^ S* . Let 

A{L) = {{M, X, 0®) \ x & L if the DTM M accepts x in < s steps} . 

Observe A{L) L. Therefore, assuming the existence of a p-optimal proof 
system for L, there is a p-optimal proof system h £ TV for A{L) by Lemma El 
Let Ih be an optimal transducer inverting h which is given by Corollary H 
On input x the machine M' first proceeds like Mi. If Mi rejects x, M[ rejects. 
If Mi accepts x in s steps then M- runs Ih on input (Mi,x, 0®) and accepts x 
iff Ih produces some output. Because Ih inverts h, Ih produces an output on 
input (Mi,x, 0®) if, and only if, (Mi,x, 0®) is in the range A{L) of h. Therefore 
L{M[) = L{Mi) n L. Now assume L{Mi) C L. Observe that (Mi,x, 0®) £ A{L) 
for any x £ if*, s > 0. By Proposition El there is a function r £ TV such that 
h{r{{Mi, X, 0®))) = {Mi, X, 0®) for x £ E*, s > 0. Due to the optimality of Ih this 
implies that time/,^((Mi, x, 0®)) < q{\{Mi, x, 0®)|) for all x £ E*, s > 0, and some 
polynomial q. As tYmeM'.{x) = f -I- time/,^ ((Mi, x, 0*)) where t = timeMi(x) we 
obtain timeM'(2^) ^ p(tiineMi (x) + l^^l) some polynomial p and any x £ E* . 



Proposition 5. If L has a p-optimal proof system then an optimal acceptor for 
L exists. 

Proof. Let M{, M^, ... be the recursive enumeration of acceptors for subsets of 
L that can be obtained by LemmaElfrom a standard enumeration Mi, M 2 , ... of 
deterministic Turing machines. Let U be some universal Turing machine which 
on input (z, x) can simulate s steps of M' on input x in time CiS^ -\- Ci {U first 
constructs a suitable encoding of M( in less than Ci steps). 
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On input x the optimal acceptor Opt proceeds in stages, each second stage 
Opt spends for simulating one step of U on input (l,x), each 4th stage is spent 
for simulating one step of U on (2,x) each 8th stage is spent for {3,x), and so 
on. Generally said, in stage 2*“^ + 2®(j — 1), j > 1, the jth step of U on {i, x) is 
simulated. If in some simulation a machine accepts. Opt also accepts and halts 
(rejects are discarded). Note that stage n can be performed in time 0{n). 

As each machine in M{, M^, . . . accepts only strings from L, it is clear that 
L(Opt) C L. Let now Mi be some acceptor for L. By Lemma 0M' is also an 
acceptor for L with timeM'(2^) ^ p(timeMi(a^) + |a^|)- Now Opt will accept x in 
stage 2®“^ + 2^ {ap{time Mi (x) + \x\)'^ + Ci — 1) due to M[ (or earlier due to some 
other machine) . The time needed to reach this stage is bounded by a polynomial 
in time Mi (a:) + |a;|. 

5 Sets without an Optimal Acceptor 

In this section we show that one encounters sets which have no optimal acceptor 
immediately when one leaves V in the deterministic world. We construct a set 
which has an optimal time-bound but has no optimal acceptor. The result implies 
that no p-cylinder <((j-hard for (e.g.) exponential time has an optimal acceptor 
nor a p-optimal proof system. On the other hand we show that any set complete 
for £ under linearly length-bounded many-one reducibility has an optimal time 
bound. 

A function t : IV — > JV is called time-constructible if there is Turing trans- 
ducer M and a constant c > 0 such that M on input 0" outputs in not 
more than c • t(n) steps. 

Theorem 3. Let t : IN IN be a time-constructible function such that for every 
polynomial p there is a number n with p(n) < t(n). Then there is a language 
L € DTIMEftfn)) for which the following holds 

• there is no optimal acceptor for L. 

• t(n) is an optimal time bound for L. 

Proof. Let Mi, M 2 , ... be a standard enumeration of deterministic Turing ac- 
ceptors, and let C/ be a universal machine which on input of 0*la; can simulate s 
steps of Mi on input x in time < Ci ■ s^ Ci. For any i > 0 let Li be the regular 
language described by the expression 0*10*. Define 

L'i = {x G Li \ U does not accept x in less than t(|x|) steps} , 

and let L = Clearly L G DTIME(t(n)). This construction guarantees 

that for any machine Mi accepting L it holds = Li and CitimeMi(a^)^ + Ci > 
t(|a;|) for x G Li. Using this one obtains in a straightforward way that L has the 
stated properties. We omit the details due to the limited space. 

Observe that any time-constructible function t with V C DTIME(t(n)) fulfills 
the condition of Theorem El 

Together with Theorem Eland Lemma El one obtains the following two corol- 
laries. 
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Corollary 2. No set <t^-hard for £ has a p-optimal proof system. 

Corollary 3. No p-cylinder <tf^-hard for £ has an optimal acceptor. 

The proof of Theorem El does not rely on the monotonicity of time-bounds 
implied by the definition of the notion ‘time-bound’ in this article. Using the 
monotonicity of time-bounds one obtains the following theorem. Due to the 
limited space we state it without a proof. 

Theorem 4. Every set hard for £ under linearly length-bounded polynomial 
time many-one reducibility has an optimal time-bound. 

6 Optimal Nondeterministic Acceptors and 
Optimal Proof Systems 

In this section we briefly sketch how the results in the previous sections can 
be translated to the nondeterministic case. Clearly the notions of an optimal 
acceptor has a straightforward nondeterministic correspondence. A proof system 
h for L is called optimal if every proof system / for L is simulated by h which 
means that there is a polynomial p such that for any x in the domain of / there is 
a y, \y\ < p(|a^|), with h{y) = f{x). Basically, a proof system h can be associated 
with the nondeterministic acceptor N which on input x guesses w and accepts if 
h{w) = X. Symmetrically a nondeterministic acceptor N can be transformed to 
a proof system h with h{{x, a)) = xii a denotes an accepting path of N on input 
X. Therefore it is a trivial observation that there is an optimal nondeterministic 
acceptor for L if, and only if, there is an optimal proof system for L. Nonetheless, 
the result corresponding to the remaining equivalence of Theorem O is of some 
interest. Again this generalizes a result from (Zj for TAUT. Due to the limited 
space the proof is omitted. 

Theorem 5. For a p-cylinder L, the following statements are equivalent. 

1. There is an optimal proof system for L. 

2. There is an optimal nondeterministic acceptor for L. 

3. There is a nondeterministic acceptor for L which has a polynomial time- 
bound on every S € J\fV with S C L. 

4-. There is a nondeterministic acceptor for L which has a polynomial time- 
bound on every S € V with S C L. 

With very few modifications the proof of Theorem 01 can be adjusted to the 
nondeterministic case. This yields 

Theorem 6. Lett \ IN ^ IN be a time- constructible function such that for every 
polynomial p there is a number n with p(ji) < t{n). Then there is a language 
L € co-NTIME(t{n)) for which no optimal proof system exists. 

It is shown in that any set <^-reducible to a set possessing an optimal 
proof system has an optimal proof system, too. Together with Theorem 01 this 
yields the following corollary. 

Corollary 4. No set <lf^-hard for co-Af£ has an optimal proof system. 
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Abstract. We introduce a new way to measure the space needed in a 
resolution refutation of a CNF formula in propositional logic. With the 
former definition pj the space required for the resolution of any unsatish- 
able formula in CNF is linear in the number of clauses. The new dehnition 
allows a much finer analysis of the space in the refutation, ranging from 
constant to linear space. Moreover, the new definition allows to relate 
the space needed in a resolution proof of a formula to other well stud- 
ied complexity measures. It coincides with the complexity of a pebble 
game in the resolution graphs of a formula, and as we show, has strong 
relationships to the size of the refutation. We also give upper and lower 
bounds on the space needed for the resolution of unsatisfiable formulas. 

1 Introduction and Definitions 

In this paper we deal exclusively with propositional logic, and the only refutation 
system considered is resolution. Due to its simplicity and to its importance in 
automatic theorem proving and logic programming systems, resolution is one of 
the best studied refutation systems. Resolution contains only one inference rule: 
If A V X and B \J x are clauses, then the clause AV B may be inferred by the 
resolution rule resolving the variable x. A resolution refutation of a conjunctive 
normal form (CNF) formula v? is a sequence of clauses Ci . ..Cg where each 
Ci is either a clause of tp or is inferred from earlier clauses in the refutation 
by the resolution rule, and Cg is the empty clause, □. One way to measure 
the complexity of resolution applied to a specific formula, is to measure the 
minimum size of a refutation for it. This is defined as the number of clauses 
in the refutation. More than a decade ago, Haken jS] gave the first proof of 
an exponential lower bound on the number of clauses needed in any resolution 
refutation of a family of formulas expressing the pigeonhole principle. In following 
years, the original proof has been greatly simplified and extended to other classes 
of formulas mm- 
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A less studied measure for the complexity of a resolution refutation is the 
amount of space it needs. This measure was defined in in the following way: 

Definition 1. 0/ Let k G IN, we say that an unsatisfiable CNF formula (p has 
resolution refutation bounded by space k if there is a series of CNF formulas 
(fii, . . . , (fis, such that If = ipi, □ G ps, in any ipi there are at most k clauses, 
and for each i < s, Pi+i is obtained from tpi by deleting (if wished) some of its 
clauses and adding the resolvent of two clauses of ipi. 

Intuitively this expresses the idea of keeping a set of active clauses in the 
refutation, and producing from this set a new one by copying clauses from the 
previous set and resolving one pair of clauses, until the empty clause is included 
in the set. Initially the set of active clauses consists of all the clauses of <p, and 
the space needed is the maximum number of clauses that are simultaneously 
active in the refutation. 

In 0 it is proven that any unsatisfiable CNF formula p with n variables and 
m clauses can be refuted in space m + n, and in ^ it is observed that the space 
upper bound 2m can also be obtained. 

Although natural, the above definition has the drawback that the space 
needed in a refutation can never be less than the number of clauses in the 
formula being refuted. This is so because this formula is the first one in the se- 
quence used to derive the empty clause. Making an analogy with a more familiar 
computation model, like the Turing machine, this is the same as saying that the 
space needed cannot be less than the size of the input being processed. To be 
able to study problems in which the working space is smaller than the size of the 
input, the space needed in the input tape is usually not taken into consideration. 
We do the same for the case of resolution and introduce the following alternative 
definition for the space needed in a refutation. 

Definition 2. Let k G M, we say that an unsatisfiable CNF formula tp has 
resolution refutation bounded by space k if there is a series of CNF formulas 
(pi, . . . , (ps, such that Pi Q p, O G p^, in any pi there are at most k clauses, 
and for each i < s, Pi+i is obtained from pi by deleting (if wished) some of its 
clauses, adding the resolvent of two clauses of pi, and adding (if wished) some 
of the clauses of p (initial clauses). 

The space needed for the resolution of an unsatisfiable formula is the mini- 
mum k for which the formula has a refutation bounded by space k. 

In the new definition it is allowed to add initial clauses to the set of active 
clauses at any stage in the refutation. Therefore this clauses do not need to 
be stored and do not consume much space since in any moment at most two 
of them are needed simultaneously. The only clauses that consume space are 
the ones derived at intermediate stages. As we will see in Section E] there are 
natural classes of formulas that can be refuted using only logarithmic space (in 
the number of initial clauses), or even constant space. 

There is another natural way to look at this definition using pebble games on 
graphs, a traditional model used for space measures in complexity theory and for 
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register allocation problems (see 0). Resolution refutations can be represented 
as directed acyclic graphs of in-degree two, in which the nodes are the clauses 
used in the refutation, and a vertex (clause) has outgoing edges to the resolvents 
obtained using this clause. In this graph the sources are the initial clauses, all 
the other nodes have in-degree two, and the unique sink is the empty clause. 
In case that in the refutation no derived clauses are reused, that is, when all 
the nodes (except maybe the sources) have out-degree one, the proof is called 
tree-like. 

The space required for the resolution refutation of a CNF formula Lp (as 
expressed in Definition|2I) corresponds to the minimum number of pebbles needed 
in the following game played on the graph of a refutation of (p. 

Definition 3. Given a connected directed acyclic graph with one sink the aim 
of the pebble game is to put a pebble on the sink of the graph (the only node with 
no outgoing edges) following this set of rules: 

1) A pebble can be placed in any initial node, that is, a node with no predeces- 
sors. 

2) Any pebble can be removed from any node at any time. 

3) A node can be pebbled provided all its parent nodes are pebbled. 

S’) If all the parent nodes of node are pebbled, instead of placing a new pebble 
on it, one can shift a pebble from a parent node. 

There are different variations of this simple pebble game in the literature. 
In fact, in E! it is shown that the inclusion of rule 3’ in the game can at most 
decrease by one the number of pebble needed to pebble a graph, but in the worst 
case the saving is obtained at the price of squaring the number of moves needed 
in the game. We include rule 3’ so that the number of pebbles coincides exactly 
with the space in Definition |21 This fact is stated in the following Lemma. 

Lemma 1. Let ip be an unsatisfiable CNF formula. The space needed in a reso- 
lution refutation of ip coincides with the number of pebbles needed for the pebble 
game played on the graph of a resolution refutation of p. 

This second characterization of space in resolution proofs allow us to use 
techniques introduced for the estimation of the number of pebbles required for 
pebbling certain graphs, for computing the space needed in resolution refuta- 
tions. However the estimation of the number of pebbles needed in the refutation 
of a formula is harder than the estimation of the number of pebbles needed for a 
graph, since in the first case one has to consider all the possible refutation graphs 
for the formula. From now on we will refer indistinctly to the space needed for 
the refutation of a formula or to the number of pebbles needed for the game on 
its refutation graphs. 

In Sectional we give upper and lower bounds for the amount of space needed 
for resolution. When measuring the space relative to the number of variables in 
the initial formula we show that any unsatisfiable CNF formula with n variables 
has a resolution proof that uses space n-\-l, and we also obtain a matching lower 
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bound, that is, we show that there are formulas on n variables whose refutation 
needs space n + 1. If we are interested to measure the space relative to the 
number of initial clauses, we can prove that any unsatisfiable CNF formula with 
m clauses can be resolved in space m, but the best lower bound we get is log m 
for the space needed in the refutation of certain formulas. Later, in Section ^ 
this lower bound is improved for the cases of tree-like and regular resolution, 
two restrictions of the resolution procedure. These results are obtained from an 
upper bound on the size of a refutation of a formula in terms of the space needed 
for its resolution and the depth of the refutation. (The depth of a refutation is 
the size of the longest path from the empty clause to an initial clause in the 
refutation graph) . We prove in Theorem 0 that if a formula has a resolution 
refutation of depth d that uses space s, then it has a resolution refutation of size 
< 2“ds^ . For types of resolution in which the depth of the proofs is bounded 
(like in the case of regular resolution), this provides an exponential upper bound 
for the resolution size in terms of the resolution space. 

We include at the end of the paper a section of conclusions and open prob- 
lems. Due to space reasons some of the proofs are not included in this version 
of the paper. 

2 Some Examples 

In this section we give two examples of families of unsatisfiable formulas that 
can be refuted within less space than its number of clauses. The first example 
are the formulas whose clauses are all possible combinations of literals in such a 
way that every variable appears once in every clause. We will see that the space 
needed to refute these formulas is bounded by the number of different variables 
in it. In fact we will prove a more general result about the space needed in a 
tree- like resolution. 

Definition 4. We say that a graph Gi is embedded in a graph G 2 if a graph 
isomorphic to G 2 can be obtained from G\ by adding nodes and edges or inserting 
nodes in the middle of edges of Gi . 

Observe that the number of pebbles needed for pebbling any graph is greater 
than or equal to the number of pebbles needed for pebbling any embedded 
subgraph in it. This is true since any pebbling strategy for the graph, also pebbles 
the embedded subgraph. 

Theorem 1. Let ip be an unsatisfiable CNF formula with a treelike resolution 
of size s, then (p has a resolution refutation of space [logs] -I- 1. 

Proof. We will show that the resolution tree in the refutation of ip can be pebbled 
with d + 1 pebbles, where d is the depth of the biggest complete binary tree 
embedded in the refutation graph. As the biggest possible complete binary tree 
embedded in a tree of size s has depth [logs], the theorem holds. It is a well 
known fact (see for example 0) that d-|- 1 pebbles suffice to pebble a complete 



Space Bounds for Resolution 555 



binary tree of depth d (with the directed edges pointing to the root). In fact 
d + 1 pebbles suffice to pebble any binary tree whose biggest embedded complete 
binary tree has depth d. In order to see this we use induction on the size of the 
tree. The base case is obvious. Let T be refutation tree, and Ti and T 2 be the 
two subtrees from the root. Let us call dc{T) the depth of the biggest embedded 
subtree in T. So 



, . _ J max(dc(Ti),4(T2)) if 4(Ti) ^ 4(T2) 

" \4(Ti) + l if 4(Ti) = 4(^2) 

By induction hypothesis one can pebble T\ with 4(Ti) + 1 pebbles and T 2 
with dc{T 2 )+l pebbles. Let us suppose that 4(4) < 4(4), then 4(T) = 4(4) 
and one can pebble first 4 with dc(4) + 1 pebbles, leave a pebble in the root 
of 4 and then pebble 4 with 4(4) + 1- For this second part of the pebbling 
one needs 4(4) + 2 < 4(4) + 1- The other case is similar. ■ 

We can apply the above lemma to compute the space needed in the refutation 
of the following formula. 

Definition 5. Let n G IN, COMPLETE-tree„ is the CNF formula on the set 
of variables {x\, . . . ,Xn\ , whose clauses are all possible combinations of literals 
with the restriction that each variable appears once in each clause. 

COMPLETE-TREE„ = (a;ia:2 • ■ • Xn), {xiX2 ■ ■ ■ Xn), ■ • ■ , {xiX2 ■ ■ ■ X„). 

Observe that this formula has 2" clauses. It is not hard to see that COMPLETE- 
TREE„ can be refuted using space n+1. This is so since a straightforward tree-like 
resolution of the formula that resolves the variables in different stages, has size 
2”+^ — 1. The previous lemma assures that this refutation can be pebbled with 
n -I- 1 pebbles. In the next section we will see that this amount of space is also 
necessary. 

As second example, consider the class of unsatisfiable formulas in CNF with 
at most two literals per clause. 

Theorem 2. Any unsatisfiable CNF formula with at most two literals in each 
clause can be resolved within constant space. 

3 Upper and Lower Bounds 

For the results in this section the following concept will be very useful. 

Definition 6. We say that a CNF unsatisfiable formula is minimally unsatisfi- 
able if removing any clause the formula becomes satisfiable. 

The next result attributed to Fas been proved independently many times. 

Lemma 2. Any minimally unsatisfiable CNF formula must have more clauses 
than variables. 
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We start by giving bounds with respect to the number of variables. 

Theorem 3. Every unsatisfiable formula with n variables can be resolved using 
resolution in space at most n + 1 . 

Proof. As mentioned in the proof of Theorem ^ for pebbling a tree of depth d, 
d+1 pebbles suffice. If we consider regular tree-like resolution, which is complete, 
we have refutation trees whose depth is at most the number of variables in the 
formula being refuted. ■ 

There is a matching lower bound, since there are formulas of n variables whose 
refutation graphs can only be pebbled with n-|- 1 pebbles. This is a consequence 
of the following result: 

Theorem 4. Let ip an unsatisfiable CNF formula and k the smallest number of 
literals of a clause of p. Any resolution refutation of p needs at least space k+1. 

Proof. It is a well known fact that in a resolution refutation, for every truth 
assignment of the variables there is a unique path that goes from the empty 
clause to an initial clause, and the assignment gives value false to all the clauses 
in the path. Let us call these paths truth assignment paths. 

For any pebbling strategy, there is a first step, let us call it i, in which the set 
of pebbled clauses becomes unsatisfiable. This step must exist because the first 
pebbling step consists of pebbling an initial clause, which is always satisfiable, 
and the last step pebbles the empty clause. In step i — 1, there was a path in 
the resolution graph that goes from the empty clause to a initial clause and does 
not contain any pebbles. Otherwise the set of pebbled clauses in step i — 1 would 
be unsatisfiable since every truth assignment will make false at least one of the 
pebbles (the ones in the truth assignment path). 

In step i, an initial clause has to be pebbled since according to the pebbling 
rules the only other possibility would be to pebble a clause with both parents 
pebbled, and this step would not transform the set of pebbled clauses into an 
unsatisfiable set. Therefore the set of pebbled clauses at step i contains at least 
k variables (the ones of the initial clause) . 

Let us suppose than the set of pebbled clauses at step i is minimally un- 
satisfiable, then, by Lemma Q, it has at least A: -|- 1 clauses because it has at 
least k variables. On the other hand, if this set is not minimally unsatisfiable, we 
can throw aside clauses until the remaining set becomes minimally unsatisfiable. 
Notice that we cannot delete the initial clause last added to the set, otherwise 
the set of clauses would be a subset of the clauses at stage i—1 and becomes 
therefore satisfiable. So, A: -I- 1 clauses are still needed because the initial clause 
is contained in the set and has at least k variables. ■ 

Since all the clauses in COMPLETE-tree„ have n variables, we obtain: 

Corollary 1. For all n G IN any resolution refutation of COMPLETE-tree„ 
requires at least space n -|- 1 . 
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Theorem 0 can be strengthen to allow to prove lower bounds for the space 
needed in the refutation of a more general class of formulas. This is done in 
Theorem 0 using the following lemma. 

Let a CNF-formula, and a a (partial) truth assignment to the variables in 
ip. ifa is a modification of cp according to a. For every variable u in a if its truth 
value is 1, all the clauses in containing the positive literal v are deleted and 
all occurrences of v are deleted. If the truth value of u is 0, then all clauses in 
containing v are deleted and all occurrences of the literal v are deleted. 

The proof of the next lemma is an easy adaptation of Q Theorem 1] . 

Lemma 3. Let R be a resolution refutation of the C N F -formula <p{p, q), where 
pU q are the variables of cp, and let a any truth assignment ofp. Then there is 
a resolution refutation ofip{a,q) whose resolution graph is embedded in R. 



Theorem 5. Let ip be a unsatisfiable CNF formula, and let k be the maximum 
over all partial assignments a of the minimum number of literals of a clause in 
ipa- The space needed in a resolution refutation of (p is at least k. 

The upper and lower space bounds measured with respect to the number 
of variables coincide. We have not been able to prove a matching result when 
measuring the space with respect to the number of initial clauses. Observe that 
from Lemma 121 and Theorem 0 we immediately obtain: 

Corollary 2. Every unsatisfiable formula with m clauses can be resolved using 
resolution in space at most m. 

Proof. Just consider a minimally unsatisfiable subset of the initial clauses. This 
subset contains at most m — 1 variables. ■ 

However Theorems 0 and 0 can only provide a lower bound of log m for the 
space needed in the refutation of any formula. In the next section we improve 
this lower bound for some restrictions of resolution. 



4 Relationships between Space and Size 

The main result of this section provides an upper bound on the size of resolution 
refutations of a formula, in terms of the space and the depth needed in a refu- 
tation. Recall that the depth of a resolution refutation is the size of the longest 
path from the empty clause to an initial clause in the graph of the refutation. 

Theorem 6. Lf an unsatisfiable CNF formula ip has a resolution refutation of 
depth d and space s, then ip has a resolution refutation of size at most 2“ ■ d ■ s^ . 

Proof. Let R be the resolution refutation proof of depth d that can be pebbled 
with s pebbles. As in Theorem El one can follow the pebbling strategy, placing 
pebbles in R until the set of pebbled clauses becomes unsatisfiable for the first 
time. Let us call this set this set of clauses ip\. One can then start the pebbling 
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strategy again from the beginning until all the clauses in the set ipi (except 
its initial clauses) can be inferred using resolution from the new set of pebbled 
clauses. Define this set to be (p 2 - always exists since the clauses in ipi are part 
of the refutation and therefore have been inferred using resolution. In the same 
way one can define the set of clauses (ps, <p 4 , and so on. Observe that for any i, 
Lpi contains at most s clauses, since all of them are simultaneously pebbled at a 
certain stage. 

The last set in this series is the set p>z formed by a set of clauses that can be 
inferred from the first s initial clauses being pebbled. In order to see how large 
z can be, define di to be the sum of the depths in R of all the clauses in and 
Pi to be the number of clauses in pi. Clearly there cannot be two sets with the 
same pair (di,pi) since one of the sets is inferred from the other one (and from 
some initial clauses) and therefore all the clauses in both sets cannot be at the 
same depth. Since for every i di is smaller than d ■ s, and pi < s, we get z < d- s^. 

We show now how to build a new refutation of bounded size. The idea is 
to infer sequentially from pi+i the clauses in cpi. Since these sets have bounded 
size, the number of resolution steps in between can also be bounded. This is 
clear for the last step since pi is an unsatisfiable set with at most s clauses, 
and by Lemma El it contains an unsatisfiable set of clauses with at most s — 1 
variables. From such a set the empty clause can be derived with a refutation 
of size bounded by 2® (using for example regular tree-like resolution which is 
refutation complete). In order to prove this bound for a derivation of pi from 
Pi+i for all i, we will start by the initial clauses, eventually inferring new sets of 
clauses p[ from which the clauses in pi+i can still be derived. The clauses in tp' 
have the property that for every clause C in pi there is a clause C' in tp' whose 
literals are a subset of those of C. Because of this property, from p[ the empty 
clause can still be derived. 

We start inferring the clauses in pz- Let C be any clause in this set. We 
consider the derivation of C from (some of) the initial clauses. Let us call these 
clauses p'^' and this proof . We apply Lemma El to R^ defining a partial truth 
assignment on the variables of C. The partial truth assignment used is: 

. . f 1 if literal v € C 
“(^) = |0ifliteralueC 

We get a proof of the empty clause □ from p'^. p^ is an unsatisfiable set 
and has at most s clauses. We get rid of all useless clauses in order to transform 
it into a minimally unsatisfiable formula denoted by p'^- By Lemmei^p''^ has 
at most s — 1 variables, so there is a refutation of with at most 2® clauses. 
This proof has none of the literals in C. Adding them again to the new proof we 
get a derivation of a clause C' C C. If this clause is different from C we modify 
Pz by substituting C by C . This is done for all the clauses in pz obtaining the 
set p'z,. For this derivation at most 2® • s clauses are needed. 

For the next step we have to infer a (possibly modified) set pz-i from the 
set p’z^. For every clause C in pz-i we derive a clause C’ C C. Let us call p^ the 
set of clauses from the original pz that were used to derive C . Now we possibly 
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do not have this set, but there is a set ip' ^ containing a subclause of each of 
the clauses in p'p . With this and the argument used in the previous step we can 
derive the clause C . Finally from (if not before), the empty clause is derived. 

We have shown that in order to infer (^' from at most s ■ 2® clauses are 
needed. Since there are at most z < d ■ intermediate sets of clauses, the total 
size of the new refutation is bounded by 2® • d • s^. ■ 

We get different consequences from this result, for example: 

Corollary 3. The set of unsatisfiable CNF formulas with resolution refutations 
of polynomial depth and logarithmic space, have resolution refutations of poly- 
nomial size. 

In some types of resolution, the depth of the proof is automatically bounded. 
For example in regular resolution it is required that in every path from the empty 
clause to an initial clause in the refutation graph, every variable appears at most 
once. Clearly in this case the number of variables is a bound on the depth of the 
proof. 

Corollary 4. If an unsatisfiable CNF-formula on n variables has a regular res- 
olution refutation of space s, then it has a resolution refutation of size at most 
2®-n-s3. 

For the case of regular resolution this upper bound allows to improve the 
lower bound on the space of a refutation measured with respect to the number 
of initial clauses stated in Section 0 

Beame and Pitassi show in that for sufficiently large n, any resolution refu- 
tation of the formula expressing the pigeonhole principle for n pigeons, PHP((_]^, 
requires size at least 2"/^°. As a direct consequence of the above theorem we get: 

Corollary 5. For sufficiently large n, any regular resolution refutation of the 
formula requires space f2{n). 

Since the formula PHP”_j^ has 0{n^) clauses, measured in terms of the num- 
ber m of initial clauses of this formula this means a lower bound of for 

the space needed for its regular refutation. 

For the case of tree-like resolution. Theorem 0 and the mentioned bound for 
PHP"_ 2 , provide the following result: 

Corollary 6. For sufficiently large n, any tree-like resolution refutation of the 
formula requires space nl20. 

An interesting question is whether the depth of the refutation can be taken 
out of the bound given by Theorem 0 A way to do this would be by showing 
that a refutation of a formula can be transformed into another one that uses the 
same amount of space, but has bounded depth. It is not clear at all that this 
result holds, but as we see in our next result, it does hold for the case of tree-like 
resolution. 

Theorem 7. If p is a CNF unsatisfiable formula with a tree-like resolution 
refutation of space s, then p has a tree-like regular resolution refutation that 
uses the same amount of space. 
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5 Conclusions and Open Problems 

We have introduced a new definition to measure the space needed in the resolu- 
tion of an unsatisfiable formula. This definition is more natural that the existing 
one since it is closer to space measures in other complexity models and can be 
characterized in terms of a well studied pebble game. We have obtained upper 
and, in some cases, matching lower bounds for the space needed, as well as rela- 
tionships between the space and the size of a refutation. These results bring new 
insight in the structure of resolution and hopefully will be useful in the analysis 
of refutations. 

There are however several interesting problems that remain open. One of 
them is to match the upper and lower bounds for the space needed for general 
resolution, measured in terms of the number of initial clauses. Recall from Sec- 
tion 01 that the bounds we have a respectively m and logm. Other important 
question are whether Theorem 01 can be modified so that the depth is not a 
parameter in the right side of the upper bound for the size, or whether it is true 
that every unsatisfiable formula that can be resolved in logarithmic space, has 
a resolution refutation of polynomial size (an improvement of Corollary 01 . 
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Abstract. The problem instance of Vertex Cover consists of an undi- 
rected graph G = {V, E) and a positive integer k, the question is whether 
there exists a subset C C V oi vertices such that each edge in E has at 
least one of its endpoints in C with IGI < k. We improve two recent 
worst case upper bounds for Vertex Cover. First, Balasubramanian et 
al. showed that Vertex Cover can be solved in time 0{kn + 1.32472*^fc^), 
where n is the number of vertices in G. Afterwards, Downey et al. im- 
proved this to 0{kn -|- 1. 31951*’ fc^). Bringing the exponential base signif- 
icantly below 1.3, we present the new upper bound 0(kn + 1.29175*fc^). 



1 Introduction 

Vertex Cover is a problem of central importance in computer science: 

— It was among the first AP-complete problems 0. 

— There have been numerous efforts to design efficient approximation algo- 
rithms PI , but it is also known to be hard to approximate Q . 

— It is of central importance in parameterized complexity theory and has one 
of the most efficient fixed parameter algorithms P], which is also subject 
of pi E] and this paper. 

— It has important applications, e.g., in computational biochemistry, where it 
is used to resolve conflicts between sequences by excluding some of them 
from a sample and, for this reason, the algorithm of Balasubramanian et 
al. 13 has been implemented as part of the DARWIN project at ETH Zurich 
laiS]. In particular, exact algorithms are important here. 

An instance of Vertex Cover is an undirected graph G = {V, E) and a positive 
integer k. The question is whether there exists a vertex cover set C C V with 
\C\ < k such that for all edges (u,v) in E, it holds that u G C or v G C. A 

* Supported by a Feodor Lynen fellowship of the Alexander von Humboldt-Stiftung, 
Bonn, and the Center for Discrete Mathematics, Theoretical Computer Science and 
Applications (DIMATIA), Prague, Czech Republic. 
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straightforward greedy algorithm shows that Vertex Cover is approximable to 
a ratio 1 (cf. M)- However, unless P = NP, Vertex Cover has no polynomial 
time approximation scheme Q and it is known to be not approximable to a 
ratio 0.1666 p. 

Vertex Cover has seen quite some history of progress with respect to fixed 
parameter algorithms (“fixed parameter” refers to k, see pun for details). 
Recently, Balasubramanian et al. 0 came up with a greatly improved fixed pa- 
rameter algorithm for Vertex Cover, running in time 0(A:n-|- 1.324718^fc^). They 
employ an intricate, improved search tree algorithm. Very recently, this result 
was slightly improved to 0{kn + 1.31951^A:^) by Downey et al. |5j. Note that 
according to the authors this “tiny difference amounts to a 21% improvement in 
the running time for k = 60.” In the following we prove a better upper bound 
of 0{kn + 1.29175^fc^), thus breaking the 1.3 barrier in the base of the expo- 
nential term. Adopting the above example for k = 60, our new result means 
an improvement of 78% to the result of Balasubramanian et al. A technical 
report H21 contains all details that had to be omitted due to lack of space. 



2 Preliminaries and Basic Notation 



Let G = (V,E) be an undirected graph. A set (7 C V is a vertex cover of G, if 
for every edge (i,j) G E, either i G C or j G G or both i,j G G. A vertex cover 
is minimal or optimal if it has minimum size, i.e., if there is no vertex cover 
that has less vertices. By N{x) we denote the set of neighbors, i.e., adjacent 
vertices, of a vertex x. For the ease of notation, we often write {x, N(jj)} instead 
of {a;} U N (y) or N {{x, y,}) instead of N{x) U N{y) to denote sets of vertices. A 
graph is called r-regular if every vertex has degree r; it is called regular if it is 
r-regular for some r. A graph is connected if there is a path between each pair 
of vertices. A component of a graph is a maximal connected subgraph. Three 
vertices a, b, c are a bridge of a vertex x if x, a, b, c form a cycle (a closed path). 
We say 6 is a bridge vertex of x. A cycle of length 3 is a triangle. 

Our algorithm works recursively. The number of recursions is the number of 
nodes in the according tree. This number is governed by homogeneous, linear 
recurrences with constant coefficients. It is well known, how to solve them and the 
asymptotic solution is determined by the roots of the characteristic polynomial. 
We use the same notation as Kullmann and Luckhardt m- If the algorithm 
solves a problem of size n and calls itself recursively for problems of sizes n — 
di, . . . ,n — dk, then (di, . . . ,dk) is called the branching vector of this recursion. 
It corresponds to the recurrence = tn-di -k • • • -I- tn-dk with the characteristic 
polynomial z‘^ = -k • ■ • -k where d = max{di, . . . , dk}. If a is a root 

of the characteristic polynomial with maximum absolute value, then t„ is a" 
up to a polynomial factor. We call |a| the branching number that corresponds 
to the branching vector (di, . . . , d^). Moreover, if a is a single root, then even 
tn = 0(a") and all branching numbers that will occur in this paper are single 
roots. 
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In this paper, the size of the search tree is therefore O(a^), where k is the 
parameter and a is the biggest branching number that will occur; it is about 
1.291742754 and belongs to the branching vector (3, 5, 8, 8) which occurs in Sec- 
tion|3 (Case 5.2.1). 

3 General Outline of the Algorithm 



Our algorithm works, in essence, as all previous algorithms for Vertex Cover. 
The main part is to build a bounded search tree: To cover an edge, we have 
to put at least one of its two endpoints into the (optimal) vertex cover set. 
Thus, starting with an arbitrary edge, we can make a binary decision between 
its two endpoints. In each subcase, we delete the corresponding vertex chosen 
and its incident edges and repeat this until we have built a search tree of size 2^. 
Altogether, it is easy to see that this leads to an algorithm running in time 
0{2^n) where n denotes the number of vertices in the graph. All results 
(including ours) to get more efficient algorithms are based on efforts to make 
the search tree smaller. So, Balasubramanian et al. [2| presented an algorithm 
with search tree size 1.32472^ and this was improved to 1.31951^ by Downey et 
al. 0. We further improve this size to 1.29175^. 

Before we give an overview of our approach we still have to explain briefly a 
technique called reduction to problem kernel, which is a kind of preprocessing. 
The main idea is that vertices of degree > k must be part of a vertex cover, if its 
size is at most k. Deleting all those edges leaves a graph, which can still be very 
big. If, however, it is connected and bigger than 2k^ then there cannot exist a 
vertex cover of size k since there are more than k^ edges. Hence, after reduction 
to problem kernel we can assume that the size of the graph is at most 2k^. 

In parameterized complexity theory the resulting algorithm is known as Buss ’ 
algorithm ^ , but basically the same approach can be traced back to older ideas 
from VLSI-theory, e.g., Evans [^. It is not difficult to see, using appropriate sub- 
algorithms, that Buss’ algorithm has running time 0{kn+{2k^)^k^). Combining 
reduction to problem kernel with the search tree algorithm described before, we 
get easily an 0{kn -|- 2^k^) algorithm for Vertex Cover. All subsequent improve- 
ments concentrated on replacing the exponential term 2^ by a smaller one. 

The algorithm we describe is closer in spirit to the one of Balasubramanian 
et al. 0 than to that of Downey et al |S|. The main difference between both 
approaches is that Downey et al. employ a different reduction to problem kernel, 
which not only works as preprocessing, but is also applied during the search 
tree construction. We refer to Downey et al. 0 for details. By way of contrast, 
Balasubramanian et al. |5| and we use the “more classical approach” where the 
search tree deals also with vertices of degree 2 and 3 and reduction to problem 
kernel is only applied once as a preprocessing phase. In the rest of the paper, we 
concentrate on shrinking the search tree size. 
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3.1 Overall Structure of the New Search Tree Algorithm 

The algorithms finds recursively an optimal vertex cover as follows. Given a 
graph G, we choose several subgraphs Gi,. . . ,Gk and compute optimal vertex 
covers for all of them. From them we can construct an optimal vertex cover for G. 
For example, let x be some vertex of G and let Gi be the subgraph that results 
from G by deleting x and all incident edges. A vertex cover of Gi, together with 
X, is then a vertex cover of G. Moreover, if there is an optimal vertex cover for G 
that contains x, then we can construct an optimal vertex cover from an optimal 
vertex cover of G\. Otherwise, if no optimal vertex cover of G contains x, they 
must contain all neighbors of x. Hence, let G 2 be the graph that results from 
G by deleting all neighbors of x. Again, we can construct an vertex cover of G 
by taking a vertex cover of G 2 and adding all neighbors of x. If we start from 
optimal vertex covers for G\ and G 2 , then one of the resulting covers for G must 
be optimal, since either x or its neighbors must be part of any vertex cover. We 
say we branch according to x and N{x), where N{x) denotes the neighbors of x. 
In the first branch, x will be part of the vertex cover and in the second branch 
it will be N(x). The vertex cover constructed grows in size with each step. Since 
its size cannot exceed k, the goal, the algorithm terminates. 

In principle that is the way our algorithm works, but we choose the subgraphs 
Gi,. . . ,Gfe in a more complicated way and branch according to much more com- 
plicated sets. The rules how to choose those branching sets are as follows, if the 
graph is connected: 

1. If there is a vertex x with degree 1, then branch according to N{x) (and 
nothing else). There is no other branch, since there is always an optimal 
vertex cover that contains N(x) and does not contain x. 

2. If there is a vertex x with degree 6 or more, then branch according to x and 
N{x). 

3. If there are no vertices with degree I or at least 6, but there is a vertex with 
degree 2, then proceed as shown in Sectional 

4. If 1.-3. do not apply and if the graph is regular, then choose some vertex x 
with maximum degree and branch according to x and N{x). (This can hap- 
pen at most three times in each path of the search tree and increases its size 
at most by a small constant factor.) 

5. If 1.-4. do not apply and if there is a vertex with degree 3 then proceed as 
shown in Section]^ 

6. Otherwise, there must be a vertex with degree 4 and all other vertices have 
degrees between 4 and 5. Proceed as shown in Section El 

If the graph is not connected, then the algorithm chooses some component 
G' and tests recursively if G' has a vertex cover of size k or less and, if it has, 
finds out the optimal size k' of a vertex cover for G' . Then it proceeds to test 
if G — G', the other components, have a vertex cover of size k — k' . In this way 
the algorithm finds out whether the whole graph has a vertex cover of size k. 
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4 Degree-2-Vertices 

If the graph is 2-regular (and connected), all vertices constitute a cycle and it 
is very easy to construct an optimal cover in linear time. Otherwise let x be 
a vertex with degree 2 and a, b its neighbors, where a has degree > 3. The 
algorithm chooses the first of the following four cases that applies. 

Case 1. There is an edge between a and b or x has a bridge whose bridge vector 
has degree 2. Then include {a, 6} into the vertex cover, which is optimal. No 
branching is necessary. 

Case 2. Assume that \N{a) U N{b)\ > 4. Then branch according to {a,b} and 
N{a) U N{b), whose branching vector is at least (2,4). 

Case 3. Assume x has exactly one bridge. Then a’s degree must be 3 and b’s 
degree must be 2. Otherwise \N{a) U N{b)\ > 4. Then there is an optimal cover 
that does not contain both a and y, the bridge vertex: If a and y are part of 
an optimal vertex cover then we can assume that x is also in the cover (but b 
is not). Replacing a by N{a) produces another vertex cover that is not bigger. 
Hence, we branch according to N(y) and N(a). The branching vector is at least 
(3,3). 

Case 4. Finally, let x have two bridges. Then the degrees of both a and b must 
be 3 since otherwise \N{a) U A^(6)| >4. Let y and z be the bridge vertices. We 
can branch according to y and N{y). If y is in an optimal cover, including y and 
z, but not a or 5 is optimal, since two further vertices are necessary anyways to 
cover all incident edges of a and b. Hence, we can branch according to N(y) and 
{x,y,z} with a branching vector at least (3,3). 

5 Degree-3-Vertices 

In this section, the graph can contain vertices with degrees between 3 and 5. 
Particularly there must be at least one vertex with degree 3. Due to the lack of 
space, many details and proofs of correctness had to be omitted. 

For Cases 1, 2, 3, and 4 let x be such a vertex and let a, b, and c be its neigh- 
bors. The first four cases distinguish on the structure of the subgraph around x, 
in particular on the degree of its neighbors and whether x has triangles or bridges. 
Case 5 is different, it rather assumes that no vertices exist in the whole graph, 
for which one of the first four cases applies. 

Case 1. Assume that x is part of a triangle, e.g., let {x, a, b} be the triangle 
(but there can be more triangles). Then we can branch according to N(x) and 
N{c). If X is not part of the cover, N{x) is. If x is part of the cover, then is a or 
b. If c is also in the cover, then two neighbors of x are and we can replace x by 
N{x). The branching vector is at least (3,3). 
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Case 2. Assume that x has at least two bridges (separate ones or a double 
bridge). Let y and z be the middle vertices on the bridges. We can branch 
according to N{x) and {x,y,z}. The branching vector is at least (3,3). 



Case 3. Next, assume that x has exactly one bridge, let us say between a and b. 
Call the center vertex on the bridge again y. Let us further assume that a or 6 
has degree 3, without loss of generality a. Then we branch according to N{x) 
and N{a). The branching vector is at least (3, 3). 



Case 4. Now assume again that x has exactly one bridge as in the case above, 
but both a and b have degrees of at least 4. Then we can branch according to 
N{x), N{a), and {a,x,N{b),N{c)}. Since we can assume that x is not part of a 
triangle and there is exactly one bridge, we get the branching vector (3,4,7). 



Case 5. Finally, we can assume that there is no vertex with degree 3 that has 
a bridge or a triangle. 

Case 5.1. Assume that there is a vertex x with degree 3 and neighbors a, b, 
c, two of which have degree at least 4, say, a and b. We pick either N{x) or 
{x, N{a), N{b)} or {x,a,N{b),N{c)} or {x,b, N{a), N{c)}, using that, in order 
to get an optimal vertex cover, at most one of the neighbors can be chosen 
together with x. This yields the branching vector (3,7, 7, 7). 

Case 5.2. Otherwise, we can assume that each degree 3 vertex has at most 
one neighbor with degree > 4. We assumed further in this section that the 
graph is not regular and has at least one vertex with degree 3. Since the graph is 
connected there must be some vertex with degree 3 that has exactly one neighbor 
with degree 4 or 5. 

Case 5.2.1 Let us assume that there is no cycle of length 5 with the following 
two properties: (1) each vertex on the cycle has degree 3 and (2) there is a vertex 
on the cycle that has a neighbor with degree at least 4. We choose some vertex 
with degree 3 that has a neighbor with degree 4 or 5. Call this vertex 03 and 
the neighbor 63. The other two neighbor of 03 must have degree 3. From each 
vertex with degree 3 we can inductively follow some path that consists solely of 
degree-3 vertices: Just choose a neighbor with degree 3, but not that one you 
came from. Start such a path from 03 and call the vertices 02, ai, ag. Start 
another path and call the vertices 04, 05, ag. Each of the Ui has at least two 
neighbors with degree 3, i.e., Oi-i and Oi+i. The third neighbor is called bi and 
might have degree 3, 4, or 5. 

Figured shows the resulting part of the graph. This picture does not neces- 
sarily denote a subgraph of G: Firstly, all vertices bi are shown with degree 4, 
but some of them might also have degree 3 or 5. On the other hand, we know 
that all Qi have exactly degree 3. Secondly, not all vertices shown in picture must 
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be necessarily distinct. For example, we could have b\ = 04 (then b\ would have 
degree 3 ), since it does not violate any assumptions we made for this subcase. 
The picture in Figure Q is therefore merely a sceletal structure leaving open 
many details. The freedom of these details lays in variation of degree of the biS 
and pairs of vertices being identical. 

Our algorithm’s behavior must depend on these details, but mostly we branch 
according to 

1. {02,63,04}, 

2. {01,62,03,64,05}, 

3 . {oi, 62, fV(63),lV(64), 65,05}, and 

4 . {oo, 6 i, 7 V( 62 ),fV( 63 ), 64 , 05 }, 

which can be found marked in Figure ^ The correctness is seen as follows: The 
first branch handles the case that 03 is not in the cover, the remaining branches 
that it is. The second branch assumes that 02 and 04 are not in the cover. The 
third branch assumes that 02 is not, but 04 is in the cover. We can then further 
assume that 62 is not in the cover, otherwise there is another optimal cover that 
contains N{a^) instead of 03. Moreover, we can assume that also 64 and 05 are 
not part of the cover, since otherwise we can replace 04 by N{ai), which is then 
handled by the second branch. Hence, the neighbors of 63, 64, and 05 are in the 
cover and we get altogether {m, 62, fV(63), fV(64), 65, 05}. The third and fourth 
branches are symmetric. 

The resulting branching vector is ( 3 , 5 , ni, rz2). Clearly 02, 63, 04 are pairwise 
distinct. Furthermore 62 yf 64, oi yf 64, 62 yf 05 (otherwise 03 has a bridge) 
making also 01,62,03,64,05 pairwise distinct and yielding the first two compo- 
nents of the branching vector, (oi, ...,05 are pairwise distinct, since they do not 
constitute a cycle.) We concentrate now on the third branch, since the fourth 
one is quite similar to it and the same reasoning applies. 

In {oi, 62, fV(63), A^(&4), 65, 05} we count 11 or 12 vertices, but some of them 
might be identical. If we could prove that the size of this set is always at least 8, 
we got the branching vector ( 3 , 5 , 8, 8), which is good enough. Unfortunately, this 
is not possible. We proceed as follows: First we find out under what circumstances 
the size of the set can be smaller than 8. It will turn out that there is only one 
pathological possibility. Then we provide a different type of branching suited 
exactly for this exception. If neither the third nor the symmetrical fourth branch 
are pathological we get a branching vector of ( 3 , 5 , 8, 8) using the above branching 
scheme. For the pathological cases we can even prove a branching vector ( 3 , 5 , 7 ). 
We omit any details. 

Case 5.2.2 Finally, we assume that there is a cycle of length 5 that consists 
of vertices with degree 3 and at least one of them has a neighbor with de- 
gree at least 4 . This cycle shall consist of gq, ... ,a^ with neighbors 6q, . . . , 64 
outside the cycle, where 62 is the neighbor with degree at least 4 . We branch 
by either picking {04,62,03} or {00,64,02,63,04} or {oq, 64, 7V(62), fV(63), 64} or 
{60, iV(64), iV(62), 63, 04}. Similar considerations as in Case 5 . 2.1 show that we 
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can get branching vectors (3, 5, 8, 8) or (3,5,7), in the latter case omitting the 
fourth branching set. 



6 Degree-4-Vertices 

In this section, we can assume that all vertices have either degree 4 or 5 and that 
there is at least one vertex with degree 4 and at least one vertex with degree 5. 
Again, several details had to be omitted. 



Case 1. Assume that there is a vertex x of degree 4 that is part of a triangle 
and has a neighbor y with degree 5. Let a and b be the neighbors of x such that 
{a, &, x} is a triangle (a or b can but need not coincide with y). 

First, we assume that a,b^ y. Let c ^ {a, b, y} be another neighbor of x. We 
can branch according to N{x), N{y), and {x,y, N{c)}: The resulting branching 
vector is at least (4,5,4). 

Now let us assume that a = y. Then we can further assume that the remaining 
two neighbors c and d of x, i.e., not a or 6, are not connected by an edge. 
(Otherwise we can choose them to play the role of a and b above.) Branch 
according to N{x), N{c), and {a;, c, N{d)}, which is sufficient for the same reason 
as above. The branching vector is again at least (4,4, 5), because c ^ N{d). 



Case 2. We assume in this case that there is no vertex x with degree 4 that has 
the following properties simultaneously: (1) x has a neighbor y with degree 5, 
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Fig. 2. All marked vertices in each of the five branches are distinct if there are 
no bridges that do not involve a. If there is such a bridge, we can assume it 
matches two marked vertices in the last branch because of symmetry. 



(2) X has at least two bridges of whom y is not part of (this might be independent 
bridges or a double bridge). We can further assume that there are no triangles 
that contain a vertex with degree 4 that has a neighbor with degree 5, i.e., 
that Case 1 does not apply. Choose some vertex x with degree 4 that has a 
neighbor a with degree 5 (such vertex exists, otherwise the graph would be 
regular or not connected). Let b, c, d be the other neighbors of x (which can 
have degrees between 4 and 5). We can branch according to IV(a), N{x), and 
{a, a;}, but we split the last branch according to whether b and/or d are part 
of the optimal vertex cover. Altogether, we branch according to N{a), N{x), 
{x, a, N{b),N{d)}, {a;, a, d,N{b), N{c)}, and {x, a, b,N{c), N{d)} (see Figure 0). 
We get the branching vector (5, 4, 8, 9, 9) if x has no bridge that a is no part of. 

There might be one bridge that a is no part of. Without loss of generality 
we can assume that this bridge goes over c and d (b, c, d are symmetric and can 
be mutually exchanged in the above branching scheme). Then all vertices in the 
third and fourth branch must be still mutually distinct, but c and d now share 
another neighbor besides x and therefore two of the marked vertices in the last 
branch coincide. This leaves a branching vector of (5, 4, 8, 9, 8). 

Case 3. Now we assume that there is a vertex x that has exactly the properties 
that were forbidden in Case 2: It has degree 4 and two bridges. There is a 
neighbor y with degree 5 that is not part of either of the two bridges. There 
are two possibilities, which are depicted in Figure 0 Either x has two separate 
bridges or a double bridge. The algorithm can branch according to N{y), N{x), 
and {x, y}. In the last branch, if {a;, y} is part of the cover, then we can assume 
that the vertices on the bridge are members of the cover, too. Otherwise, their 
neighbors would be part of the cover and that implies that at least three of x’s 
neighbors are in the vertex cover. Then, however, we can take N(x) instead of x. 
That is, this cover is already handled by the first branch. Altogether, this implies 
a branching vector (5,4,4). 

7 Conclusion 

Improving previous work w, we presented the so far best known algorithm for 
the AP-complete Vertex Cover problem, running in time 0{kn + 1.29175^fc^). 
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I ^ ^ 



Fig. 3. How to treat two separate bridges or a double bridge. 



Besides the theoretical interest, this result may also be relevant for practical 
applications. So, for example, the Vertex Cover algorithm of Balasubramanian 
et al. | 2 ] has been implemented for applications in computational biology 
our algorithm now being a natural candidate to replace or complement it. Note 
that our as well as previous algorithms that are good in the worst case have the 
clear potential to perform much better on average. 
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Abstract. In this work an alternative online variant of the matching 
problem in bipartite graphs is presented. It is motivated by a schedul- 
ing problem in an online environment. In such an environment, a task is 
unknown up to its disclosure. However, in that moment it is not neces- 
sary to take a decision on the service of that particular task. In reality, 
an online scheduler has to decide on how to use the current resources. 
Therefore, our problem is called online request server matching (ORSM). 
It differs substantially from the online bipartite matching problem of 
Karp etal. Hence, the analysis of an optimal, deterministic 

online algorithm for the ORSM problem results in a smaller competitive 
ratio of 1.5. 

We also introduce an extension to a weighted bipartite matching prob- 
lem. A lower bound of ~ 1.618 and an upper bound of 2 is given 
for the competitive ratio. 



1 Introduction 

Motivation. The problem, which is investigated here, is motivated by online 
scheduling problems with deadlines. Such problems can be found in server sys- 
tems for continuous data streams (e. g. video transmissions). Such a server con- 
sists of a set of memory modules (e. g. hard disks) to store a huge amount of data 
and to provide a high bandwidth. The latter is necessary to serve a large num- 
ber of customers simultaneously. The memory modules are connected to a set 
of I/O-ports via a communication network inside the server system. Customers 
are connected to one of the I/O-ports in order to receive a demanded continuous 
data stream. It is clear that such a server is a real-time system and that the re- 
quests for data have hard deadlines. Data which violate their deadlines are not 
of interest anymore and can be omitted. Fortunately, the situation is relaxed a 
little bit because it seems to be acceptable when a small amount of the required 
data will not be delivered. A first question, which needs to be solved concerns 
the mapping of the data to the memory modules. To increase the usability of 
the system and to reduce the number of conflicts, the data or parts of it can be 
stored in more than one copy in different memory modules. Secondly, the access 
to memory modules and network resources has to be controlled. The difficulty of 

* Supported by DFG-Graduiertenkolleg “Parallele Reclinernetzwerke in der Produk- 
tionstechnik” , GRK 124/2-96. This work was partly supported by the EU ESPRIT 
Long Term Research Project 20244 (ALCOM-IT). 
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these problems is increased by the online nature of the environment. The con- 
tinuous delivering of a data stream can be preempted by a customer or it can be 
restarted at a completely different time index. Additionally, customers can ar- 
rive to the system or leave it at any time. So, future demands of server resources 
are unknown. However, scheduling decisions (assign appropriate resources to the 
requests) have to be taken under this uncertainty. 

It is not difficult to construct a formal model for such a server system includ- 
ing all dependencies. However, its analysis is a challenge. To start investigations 
we performed a process of abstractions and simplifications: A continuous data 
stream is divided into packages of roughly equal size. Under this assumption, 
a simple, discrete time model is used. Abstracting from additional restrictions 
that model a restricted communication network between memory modules and 
customers, the scheduling problem becomes a matching problem in a bipartite 
graph. The scheduler has to match requests for unit size data packages to appro- 
priate server resources and time slots. However, these decisions does not need 
to take immediately when a request arrives. Instead, an online scheduler has to 
determine how to use the current resources. 

In this work we start our studies with a very basic model for this problem and 
we call it online request server matching. It is highly idealized and therefore it 
is not able to represent the inherent structure of the initial problem. The ORSM 
model is a matching problem in a rather general bipartite graph in a new online 
variant. Nevertheless, the definition was selected in such a way, that an extension 
to more realistic models is possible. A development to a weighted matching 
problem is also discussed here and it is motivated by additional priorities of 
requests. 

To investigate our online problems, we apply competitive analysis. Roughly 
speaking, it determines the worst case ratio (called competitive ratio) of the 
solution quality of an online algorithm and an optimal solution over all inputs. 
Nowadays, it is an established method and it counterbalances results which 
assume an input distribution. A comprehensive introduction to online problems 
and competitive analysis is the textbook by Borodin and El-Yaniv fBE Y98| . 

Previous and Related Work. Over the last years a vast amount of publications 
and single results on online scheduling can be found. gives an encyclope- 

dic survey. However, studies of scheduling models which allow deadlines are rare 
and cover very restricted cases only. None of them seems to be closely related 
to our model. The opposite is true for bipartite matching problems in online 
settings. The study of such problems was initiated by Karp, Vazirani and Vazi- 
rani in [srani. Their model consists of a ‘known’ and an ‘unknown’ partition. 
Vertices of the unknown partition are revealed over time and an online algorithm 
can put at most one edge of a just revealed vertex in the online matching. The 
simple Greedy algorithm is an optimal, 2-competitive, deterministic online al- 
gorithm for this problem. The key contribution of |KVV9flj is the analysis of the 
optimal, randomized algorithm Ranking which is « 1.582-competitive. 

Various extensions and variants of this problem have been studied later. A 
comprehensive survey and further references can be found in [KEnH!. 
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An online matching problem in general graphs is the roommates problem 
lEESni: Guests arrive by and by at a hotel for a conference. The hotel consists 
of double rooms only. Every guest has a list of acceptable roommates (it is 
assumed that these lists are symmetric; they represent the adjacency lists of 
vertices in an undirected graph) . The manager must immediately assign a room 
to every guest and wants to minimize the number of occupied rooms. In lljhi.931 
an optimal, deterministic, 1.5-competitive algorithm is shown. For a weighted 
extension of the problem a lower bound of 3, and a 4-competitive algorithm are 
given. 

Results and Organization of the Material. In this work we investigate a different 
online variant of the bipartite matching problem. The precise description can be 
found in Sect.|^ Section 0 presents the results of the unweighted case. A lower 
bound and a matching upper bound are given. It includes a 1.5-competitive 
algorithm. The weighted extension of our model is studied in Sect. 01 It is shown, 
that the golden ratio ( ~ 1.618) is a lower bound for the competitive ratio. 
A 2-competitive, deterministic online algorithm is presented as well as an outline 
of its analysis. Due to space limitations, the proof of the upper bound had to be 
omitted. A few concluding remarks in Sect. Incomplete this work. 

2 The Model 

A formal definition of the online request server matching problem (ORSM) and 
its weighted extension (wORSM) is now presented. 

A bipartite graph G := (i? U S', E) represents the underlying structure of the 
problem. Both partitions R and S are totally ordered. We denote the vertices by 

’’ 2 , ■ ■ ■ and by si, S 2 , S3, . . . with G R, Si G S, and the indices indicating 

the position within the order. We interpret this order as a discrete time model. 
The vertices of partition S represent a single resource called server. It is available 
for one unit each time step. Partition R is interpreted as a set of tasks. Such a 
task has a demand of one server unit to be completed. They are called requests, 
and every time step one of them might occur. An edge {r^, s^} between a request 
vertex and a server vertex Sj means that request can be served in time step 
j. The set of edges E C R x S is constructed with the following restriction: 



{r„Sj} G E ^ i < j . (1) 

This means a request that occurs at time step i must not specify a possible service 
time in the past. Without this restriction the modelled scheduling problem does 
not make sense and no competitive online algorithm exists. 

Now we have to specify how this model works online: When the system starts 
partition R is completely unknown^. In the beginning of a time step i the request 

^ When taking a close look this is not the truth. Every time step i a new vertex is 
inserted but its set of incident edges is before unknown. For reasons of convenience 
the input process is interpreted in the introduced way. 
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Ti is revealed as input, i.e., vertex and all of its edges are inserted into the 
previously known part of G. If no request appears, vertex is isolated. After this 
input, an algorithm has to decide on how to use the server in the current time 
step i. It can add an edge incident to Si to the online matching M . It is worth 
noting that due to restriction du all edges incident to Si are known when this 
decision has to be taken. The online algorithm has the objective to maximize 
the cardinality of matching M, i.e. to serve as many requests as it can. 

We want to emphasize the difference between the model in |K VVh()| and ours. 
Karp et al. study an online version of the bipartite matching problem where the 
input of a time step is a vertex out of the ‘unknown’ partition and all of its 
incident edges. Then, an online algorithm has to select at most one of these just 
revealed edges for the online matching. Contrarily, the decision on a matching 
edge in the ORSM problem is taken by selecting one of the edges which are 
incident to a vertex of the opposite partition. Nevertheless, the ORSM problem 
can be interpreted as a special case of the online bipartite matching. Then, the 
server partition is the ‘unknown’ partition and whenever a decision has to be 
taken, all edges incident to that vertex are known. Furthermore, all additional 
information provided by the ORSM model is ignored. When the focus is shifted, 
the ORSM problem can also be recognized as a special case of the roommates 
problem. The precise relationship is shown in plieOSj . 

So far, the graph G is unweighted. The objective changes when a weight 
function w if — > 1R+ is added. In this case, a matching with maximal total 
weight has to be constructed. This version is called online request server weighted 
matching problem or in short wORSM problem. 

In the discussions and proof of this work M/\iq is used in order to denote an 
online matching which is constructed by algorithm A LG, \M\ to denote the total 
weight (cardinality respectively) of matching M, and OPT denotes an optimal 
solution. 



3 Analysis of the ORSM Problem 

This section starts with a general lower bound for the competitive ratio of 
the ORSM problem for deterministic online algorithms. Thereafter, the optimal 
1.5-competitive algorithm LMM is presented and analysed. 



3.1 The Lower Bound 

By applying the standard argument of an adversary strategy, we will show the 
following general lower bound: 

Theorem 1. Every deterministic online algorithm ALG for the ORSM problem 
has a competitive ratio of at least 1.5. 



Proof. The adversary strategy starts with the following input structure (see 
Fig.[T|: E = {{ri,S2},{ri,S3},{r2,S2},{r2,S4}} . 
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Fig. 1. Situation at time t = 2. 
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(the figures show the situations at time t = 3, according to different decisions of ALG 
and the reactions of the adversary; a double line is a matching edge; dotted lines are 
edges hich have been removed before t = 3) 



An online algorithm ALG can react to this input at time t = 2 in three 
different ways: 

Case 1 (Fig. 13 ): ALG puts edge {ri,S2} to the online matching Malg- In the 
next step the adversary presents edge {ra, S4}. ALG cannot use the server vertex 
S3. Therefore, |Malg| < 2 whereas the optimal solution results in |Mopt| = 3 . 

Case 2 (Fig.EJ: ALG puts edge {t2, S2} to the online matching Malg - In the next 
step the adversary presents edge {ra, S3}. ALG cannot use S4. Again |Malg| < 2 
and the maximum matching results in |Mopt| = 3 . 

Case 3 (Fig. ALG decides not to match S2. The adversary will present the 
input of Case 1 and |A/alg| < 2, |Mopt| = 3 holds. Alternatively, it is fairly 
obvious that ALG cannot take advantage of such a decision. 

This strategy can be infinitely repeated every four time steps and this fact shows 
the ratio 

\Mqpt\ ^3 
|Malg| ~ 2 



3.2 The Algorithm LMM 

At a time step i the graph G representing the input of an online algorithm is 
known up to request vertex . More precisely, the subgraph of G induced by the 
set {rfe| rk G R,1 < k < i} U S is known. Due to the irreversible decisions taken 
in former time steps all previous server vertices si,...,Si_i and all hitherto 
matched request vertices cannot be rearranged anymore. It remains a vertex 
induced bipartite subgraph Bi of G with the vertex set Vi : 

Vi = {rfe| Tfe G i?, rfc not matched so far, 1 < fc < U {sfe| Sfc G 5, i < A:} . 
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Our online algorithm is called ‘Local Maximum Matching’ (LMM) because 
it constructs a maximum matching on every local subgraph Bi (denoted by 
This can be done by searching and processing ‘augmenting paths’, i. e. 
paths with unmatched end vertices and alternately sequent matching and non- 
matching edges (see e. g. m mfi5| or any comprehensive text book on algorithms 
for an explanation of how to do so). The exact function of LMM is: 

1: loop {V time steps i} 

2: read input of time i and build up Bi 

3: construct a maximum matching M{Bi) on Bi : 

start with all matching edges of Ai(Bi-i) which are edges in Bi ; 

4: look for an augmenting path which starts at vertex and perform 

the augmentation when founc0 
5: if Si is matched in A4{Bi) then 

6: add the matching edge of Si to the online matching Mlmm 

7: else if Si is not isolated in Bi then {all neighbours of Si are matched in 

M(B,n 

8: add an arbitrary edge {si,r} of Bi to the matching Mlmm and delete 

the matching edge of r in A4(Bi) 

9: end if 

10: end loop 

Line 8 of this algorithm is essential and prefers the current server vertex Si. 



3.3 The Upper Bound 

To analyse the performance of LMM three observations are needed: 

1. After a request vertex has been matched in Bi (in line 4 of LMM), it is 
in all following maximum matching:|3 up to the time step where its current 
matching edge is added to Mlmm (in line 6 or 8 of LMM). 

2. If Si is not isolated in Bi, then Si is matched in Mlmm (lines 5 to 8 of LMM). 

3. Let M be a matching such that no augmenting path has length less than 
2i+l, then an maximum matching Mqpt holds |Mopt I < ^\M\. (An 
augmenting path of length 2£ -L 1 has £ matching edges in M and £ -L 1 in 
Mqpt •) 

Theorem 2. The deterministic online algorithm LMM is 1.5-competitive. 

Proof. It will be shown that no online matching Mlmm can be extended by 

augmenting paths of length one or three. Therefore, shortest augmenting paths 

^ Due to the maximum cardinality of A4(Bi-i), every augmenting path must have ri 
at one end. 

See the copy process of line 3 and remember that augmentations change matching 
edges but do not remove vertices out of the matching. 
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for Mlmm must have a length of five. This fact combined with observation 0 
completes the proof. 

Applying a complete distinction of cases, the non-existence of augmenting 
paths of length one and three in the online matching Mlmm is proven by con- 
tradiction. 



Si ra Si ra Sj Vb 

O O O O O O 

Fig. 5. Structure of augmenting paths of length one and three. 



Case 1: augmenting path of length one {si,ra} ^ Mlmm: Vertex ra was never 
matched because ra is not in Mlmm (reverse application of observation DJ) . So 
{si,ra} is in Bi and thus contradicts observation |21 

Case 2: augmenting path of length three and i < j\ r a was matched not until 
time j, which implies the edge {si,ra} was in Bi. This is a contradiction to 
observation 0 

Case 3: augmenting path of length three and i > j\ At time j, request vertices 
ra and r^ are not in Mlmm and so the whole path {{si,ra}, {ra,Sj}, {sj,r;,}} is 
in Bj. The case Si is not in j\4{Bj) contradicts the optimality of M{Bj) because 
the path is an augmenting one (see Fig. 0). Therefore, at time j, Si must be 
matched in M{Bj), i. e., there exists a request vertex Tc with {tc, s^} S M{Bj). 
Later at time k {j < k < i), line 8 of LMM deletes the matching edge {rc,Si} 
and adds {sfcj^’c} to Mlmm- 

Sk Tc Si ra Sj rt 

o o o o o o 

Due to the definition of the ORSM problem, both edges {rc, Si} and {rc, Sk} 
are known at time j. Now the above argument about Si can be recursively applied 
to Sfe and due to the finite structure of Bi, this fact contradicts the existence of 
an augmenting path of length three in Mlmm- □ 



4 The Weighted Model 

Similar to Sect. 0 a lower bound for the wORSM problem is shown first. Then 
the algorithm wLMM is presented as well as an outline of its analysis. 



4.1 A General Lower Bound 

Let (j) := Ri 1.618 be the golden ratio. 

Theorem 3. Every deterministic online algorithm A LG for the wORSM problem 
has a competitive ratio of at least (f> = . 
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Fig. 6. 

Situation at time t = 1 . 



ri 

o. 



o 

Si 



T2 

o 



o 

S2 




{^1,52} e Fig. 8. Malg = 0 



Fig. 7. 

Malg 

(Fig. El and [S| show the situations at time t = 2 after the different decisions and the 
reactions of the adversary; a double line is a matching edge; dotted lines are edges 
which have been removed before t = 2) 



Proof. The adversary strategy starts with input edges {ri, si} and {ri, S 2 } and 
their weights are w({ri,si}) = 1 and w({ri,S 2 }) = ^ as you can see in Fig. 0 
ALG can react to this input at time t = 1 in two different ways: 

Case 1 (Fig. EJ): ALG adds edge {ri, si} to the weighted online matching Malg. 
Thereafter, the adversary does not present any new edge incident to S 2 - So 
I^FalgI = 1 and |Mopt| = 4> holds. 

Case 2 (Fig.EJ: ALG does not change the online matching The adversary 

presents edge {r 2 ,S 2 } with weight w({r 2 ,S 2 }) = 4>- Now ALG can construct a 
matching with weight |Malg| < 4 > only, whereas it holds |Mopt| = ! + </>. The 
ratio of these two values is also (p . 

Every two time steps the adversary can repeat this strategy up to infinity 
and this fact shows the lower bound of the competitive ratio of ^ . □ 



4.2 The Algorithm wLMM 

The algorithm wLMM works similarly to LMM. wLMM computes a maximum 
weighted matching on the local bipartite graph Bi. Furthermore, the algorithm 
works without the special preference of vertex Si. The way LMM does it cannot 
help for the weighted problem because the weights of edges can differ by very 
small values 0 The problems arising from this fact will be demonstrated in the 
following analysis. The formal description of wLMM is: 



z 0 
loop 

i ^ i + 1 

read input of time step i and build up Bi 

construct a maximum weighted matching M.{Bi) on Bi 

if Si is matched in A4{Bi) then 

add the matching edge of Si to the online weighted matching M^lmm 

end if 
end loop 



Taking advantage of this possibility, an adversary can force such kind of ‘clever’ 
online algorithm to take the same decisions as wLMM. Therefore, the weights of the 
input edges need to be changed by a very small e only. 
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The Lower Bound of wLMM 



Theorem 4. The competitive ratio of wLMM cannot be less than 2. 



Proof. A simple input structure is applied to show the theorem. This struc- 
ture (which can be repeated arbitrarily often) is the following: w({ri,si}) = 1, 
w({m, S 2 }) = 1 -I- e, and u>({r 2 , S 2 }) = 1 




The online algorithm wLMM is too greedy and matches ri to S 2 - Then it 
holds IMwlmmI = 1 -I-£, whereas |Mopt| = 2 . Therefore, this lower bound of the 
competitive ratio exceeds any fixed real number less than 2 for a proper small 
value of e. □ 



The Upper Bound of wLMM 

Due to strict space limitations the next theorem is presented without a proof. It 
can be found in im^ . 

Theorem 5. The deterministic online algorithm wLMM is 2-competitive. 

The proof is based on a technique developed in |ljL{9d| . It extensively uses 
arguments of how dynamic changes of the local bipartite graph (insertion and 
deletion of edges) modify the structure and property of augmenting paths. The 
lemmata which are describing these characteristics are not difficult but rather 
lengthy and a bit technical. So we had to omit them and decide to omit the 
whole proof because its presentation is rather senseless without the preparatory 
explanations. However, this proof may be of minor interest only. Theorems 0 
and El reveal a large gap in the analysis of the wORSM problem. We have the 
feeling that the upper bound can be improved. Therefore, an online algorithm 
must prefer the current server vertex, e. g. by increasing the weights of its edges. 
Unfortunately, such an algorithm causes different changes in the local bipartite 
graph and our proof collapses completely. Hence, a different proof technique 
needs to be developed. 

5 Concluding Remarks 

Very often the following question arises: Can a randomized algorithm perform 
‘better’? However, randomized online algorithms are studied in a different model 
of competitive analysis (weaker adversary models) and that was not the subject 
of our investigations. Nevertheless, a simple extension of the lower bound con- 
struction of Theoremnimmediately shows a | = 1.2 lower bound for the compet- 
itive ratio of randomized online algorithms against the oblivious adversary. The 
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two input structures used by the adversary in Theorem ^ are chosen indepen- 
dently, uniformly at random to build up the input of the next four time steps. 
It is also not difficult to transform LMM into a randomized online algorithm. 
It selects one of the proper maximum cardinality matchings on Bi at random. 
However, we do not have an improved analysis yet. Note that a straight forward 
application of the Ranking algorithm IKV V90I cannot achieve an improved 
bound for the ORSM problem. 

A different direction for further research is to study restricted model variants. 
It is possible to model additional concepts like parallel resources, lookahead, or 
individual time windows for the service of requests, by restricting the structure 
of the set of edges. Some of these model variants are more closely related to 
real world problems and a few decreased competitive ratios have been already 
discovered. 
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